CN102982099A - Personalized concurrent word segmentation processing system and processing method thereof - Google Patents
Personalized concurrent word segmentation processing system and processing method thereof Download PDFInfo
- Publication number
- CN102982099A CN102982099A CN2012104355047A CN201210435504A CN102982099A CN 102982099 A CN102982099 A CN 102982099A CN 2012104355047 A CN2012104355047 A CN 2012104355047A CN 201210435504 A CN201210435504 A CN 201210435504A CN 102982099 A CN102982099 A CN 102982099A
- Authority
- CN
- China
- Prior art keywords
- word segmentation
- word
- dictionary
- personalized
- participle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to a personalized concurrent word segmentation processing system and a processing method of the processing system. The personalized concurrent word segmentation processing system and the processing method of the processing system comprises a word segmentation requesting module, a word segmentation module based on a personalized word segmentation dictionary, a word segmentation module based on a general word segmentation dictionary, a control module and a high speed word segmentation processing module. Word segmentation requests of a user are simultaneously sent to the word segmentation module based on the personalized word segmentation dictionary and the word segmentation module based on the general word segmentation dictionary. When the word segmentation module based on the personalized word segmentation dictionary is destined, word segmentation processing result is sent back to the word segmentation requesting module through the control module, and meanwhile word segmentation requests of the word segmentation requesting module to the word segmentation module based on the general word segmentation dictionary is interrupted; otherwise, dynamic update of the personalized word segmentation dictionary is proceeded according to an earliest and least using principle and the word segmentation processing result of the word segmentation module based on the personalized word segmentation dictionary by the control module. The personalized concurrent word segmentation processing system and the processing method of the processing system is capable of satisfying accuracy rate of the word segmentation, meanwhile improving word segmentation efficiency of the system greatly and satisfying efficient referring requirements of a mobile user.
Description
Technical field
The invention belongs to mobile search and Chinese information processing field, be specifically related to a kind of personalized Parallel Word Segmentation disposal system and disposal route thereof.
Background technology
Word is the least unit with certain semanteme, and so-called participle carries out cutting to a sentence according to the implication of word wherein exactly.Because generally being based on vocabulary with processing, natural language understanding carries out, and Chinese text is when wirtiting or computer-internal represent, as basic grapheme take word, there is not clear and definite boundary between word and the word, therefore, Chinese word segmentation is the basic link of Chinese information processing, also is gordian technique and difficult point in the Chinese information processing such as text classification, information retrieval, information filtering, document automatic indexing, the automatic generation of summary.
That weighs the participle algorithm performance good and bad mainly considers the following aspects: the speed of participle and accuracy, ambiguity identification, neologisms identification, whether need the degree of ripeness of corpus or rule base, algorithm efficiency, technology and implement difficulty.Chinese Word Automatic Segmentation commonly used sums up and is divided into four classes at present: the string matching based on dictionary divides word algorithm, minute word algorithm based on statistics, minute word algorithm based on understanding, combination minute word algorithm.Wherein, dividing word algorithm based on the string matching of dictionary is will treat that according to certain strategy the entry in the abundant large machine dictionary of the Chinese character string of participle and mates, if find certain character string in dictionary, then the match is successful, thereby identify a word.This class participle algorithm process is simple, the participle speed, efficient is higher, but the participle accuracy is relatively poor, and very strong to the dependence of dictionary configuration, the quality of dictionary configuration often directly has influence on the speed of participle, the space availability ratio of dictionary and the expense of maintenance directory.Minute word algorithm based on statistics is thought: word is the combination of stable Chinese character, Chinese character and the probability of the adjacent co-occurrence of Chinese character can reflect into the confidence level of word preferably in context, therefore, combination frequency to the Chinese character of adjacent co-occurrence in the language material is added up, and calculates their statistical information (such as word frequency, mutual information, t_ test poor etc.) and as the foundation of participle.This class divides word algorithm participle accuracy higher, and participle speed is general, but needs the support of Large Scale Corpus.Minute word algorithm based on understanding is to carry out syntax and semantic analysis in participle, utilizes syntactic information and semantic information to process Ambiguity, and this class divides word algorithm need to use a large amount of linguistries and information, the participle accuracy is high, but participle speed is slow, and algorithm complex is large, is difficult to implement.
Word algorithm was full-fledged gradually in desktop system and internet environment in above-mentioned minute, but may not be applicable to the great market of mobile Internet fully.Along with the development of mobile Internet, the user use portable terminal on the internet obtaining information will become trend of the times.But in mobile search, client is had relatively high expectations to the inquiry real-time, if adopt above-mentioned string matching word segmentation processing mode based on dictionary, can have larger query latency, and it is relatively poor that the user inquires about experience.Secondly, the user concentrates on several specific time periods usually by the phenomenon of mobile terminal network visiting, when a large number of users carries out mobile search simultaneously, all need to rely on first dictionary and carry out word segmentation processing, certainly will greatly increase the word segmentation processing module in the load of specific time period, thereby further increase query latency, reduce user experience.
A large amount of query history records show, within a definite time period, mobile subscriber's searching keyword is concentrated within the specific limits relatively, therefore, can utilize this centrality to open up separately a fritter storage area for each terminal user, the word that comprises in the storage user historical query content forms a personalized dictionary for word segmentation for the user.In the participle process, personalized dictionary for word segmentation based on unique user carries out the processing of two-way Parallel Word Segmentation with total participle dictionary that all users share simultaneously, can greatly improve the participle efficient of user's query contents, effectively remedy the deficiency that shows when traditional string matching participle technique based on dictionary is applied to the mobile search field.
Summary of the invention
Technical matters solved by the invention is to propose a kind of the satisfy efficient query processing of user and personalized enquire demand in the mobile search, improves personalized Parallel Word Segmentation disposal system and the disposal route thereof of user's inquiry experience.
For solving above-mentioned technical matters, the technical scheme that the present invention takes:
A kind of personalized Parallel Word Segmentation disposal system, its special character is: comprise the participle request module, based on the word-dividing mode of personalized dictionary for word segmentation, the word-dividing mode based on total participle dictionary, control module, high speed word segmentation processing module;
The participle request module be with user's query contents synchronously, parallel being sent to based on the word-dividing mode of personalized dictionary for word segmentation with based on the word-dividing mode of total participle dictionary carry out word segmentation processing, receive simultaneously the word segmentation result of control module loopback and begin the associated trigger information of next word segmentation processing;
Word-dividing mode based on personalized dictionary for word segmentation is that the substring of user's query contents and the entry in the personalized dictionary for word segmentation are mated, finish the word segmentation processing process, wherein, deposit the searching keyword of user within a period of time in the personalized dictionary for word segmentation, when using for the first time, the user creates, increase gradually entry along with user's use, the entry number is few;
Word-dividing mode based on total participle dictionary is that the substring of user's query contents and the entry in total participle dictionary are mated, and finishes the word segmentation processing process, wherein, deposits all entries that all users share in total participle dictionary, and information is perfect, and entry quantity is huge;
Control module is used for the processing procedure of synchronous two word-dividing mode, and after this word segmentation processing end, control module triggers next word segmentation processing process to participle request module loopback word segmentation result and trigger message;
User's participle request through the participle request module be sent to simultaneously word-dividing mode based on personalized dictionary for word segmentation, based on the word-dividing mode of total participle dictionary, based on the word-dividing mode of personalized dictionary for word segmentation, be sent to control module based on the process information of the word-dividing mode of total participle dictionary, control module can be delivered to the participle request module with word segmentation result and trigger message again; Based on the word-dividing mode of personalized dictionary for word segmentation, based on connecting high speed word segmentation processing module between the word-dividing mode of total participle dictionary.
A kind of personalized Parallel Word Segmentation disposal route, its special character is: user's participle request sends to simultaneously based on the word-dividing mode of personalized dictionary for word segmentation with based on the word-dividing mode of total participle dictionary, if the word-dividing mode based on personalized dictionary for word segmentation is hit, then the word segmentation processing result is back to the participle request module by control module, interrupts simultaneously the participle request module to the participle request based on the word-dividing mode of total participle dictionary; Otherwise, according to the word segmentation processing result based on the word-dividing mode of total participle dictionary, personalized dictionary for word segmentation is dynamically updated according to minimum using priciple the earliest by control module.
Above-mentioned personalized Parallel Word Segmentation disposal route, concrete steps are as follows:
Step 1: judge in based on the word-dividing mode of personalized dictionary for word segmentation whether the user's query word that receives is present in the personalized dictionary for word segmentation, if exist, then goes to step 2, otherwise goes to step 3;
Step 2: upgrade the relevant informations such as the access frequency of the query word that hits in the personalized dictionary for word segmentation and recent visit time, go to step 4;
Step 3: will be sent to based on the resulting entry of word-dividing mode word segmentation processing of total participle dictionary word-dividing mode based on personalized dictionary for word segmentation by control module, and these entries will be added in the personalized dictionary for word segmentation, and its relevant information of initialization; If personalized dictionary for word segmentation is full, then eliminate some entry according to minimum using priciple the earliest;
Step 4: control module will be back to the participle request module based on the word-dividing mode of personalized dictionary for word segmentation or current word segmentation result and the next word segmentation processing start position information that feeds back based on the word-dividing mode of total participle dictionary, beginning lower whorl word segmentation processing process.
The dynamic updating method of above-mentioned personalized dictionary for word segmentation, concrete steps are as follows:
Step 1: judge whether entry is present in the personalized dictionary for word segmentation, if exist, then execution in step 2, otherwise execution in step 3;
Step 2: the corresponding access frequency of this entry in the personalized dictionary for word segmentation is increased by 1, execution in step 6;
Step 3: judge whether personalized dictionary for word segmentation capacity has reached threshold value, if then execution in step 4, otherwise execution in step 5;
Step 4: delete in the personalized dictionary for word segmentation access time the earliest and the minimum word of access frequency;
Step 5: entry is added into relevant position in the personalized dictionary for word segmentation, and its access frequency is initialized as 1, use simultaneously its recent visit time of current time initialization;
Step 6: be disposed, finish the renewal of an entry in personalized dictionary for word segmentation and process.
Compared with prior art, beneficial effect of the present invention: when the present invention adopts personalized dictionary for word segmentation to carry out word segmentation processing, possesses higher hit rate and participle efficient, and the personalized Parallel Word Segmentation disposal route that the present invention proposes can greatly improve the participle efficient of system when satisfying the participle accuracy rate, satisfied the efficient query demand of mobile subscriber, simultaneously, because described personalized dictionary for word segmentation is for different user, preserve the user and reached recently the highest query word of access frequency, therefore, need not to learn directly to obtain by this dictionary user's interest place, be convenient to therefrom extract user interest model, the excavation of also moving for user interest simultaneously provides reliable foundation.
Embodiment
The below is described in further detail the present invention.
Disposal system of the present invention comprises the participle request module, based on the word-dividing mode of personalized dictionary for word segmentation, the word-dividing mode based on total participle dictionary, control module, high speed word segmentation processing module.
The participle request module be with user's query contents synchronously, parallel being sent to based on the word-dividing mode of personalized dictionary for word segmentation with based on the word-dividing mode of total participle dictionary carry out word segmentation processing, receive simultaneously the word segmentation result of control module loopback and begin the associated trigger information of next word segmentation processing;
Word-dividing mode based on personalized dictionary for word segmentation is that the substring of user's query contents and the entry in the personalized dictionary for word segmentation are mated, and finishes the word segmentation processing process.Wherein, deposit the searching keyword of user within a period of time in the personalized dictionary for word segmentation, create when the user uses for the first time, increase gradually entry along with user's use, the entry number is few;
Word-dividing mode based on total participle dictionary is that the substring of user's query contents and the entry in total participle dictionary are mated, and finishes the word segmentation processing process.Wherein, deposit all entries that all users share in total participle dictionary, information is perfect, and entry quantity is huge;
Control module is used for the processing procedure of synchronous two word-dividing mode, specifically comprise when entry hits in personalized dictionary for word segmentation, send look-at-me by control module to the word-dividing mode based on total participle dictionary based on the word-dividing mode of personalized dictionary for word segmentation, in order to interrupt based on the word-dividing mode of the total participle dictionary word segmentation processing to current word string; When entry was not present in the personalized dictionary for word segmentation, control module will be according to based on the word segmentation processing result of the word-dividing mode of total participle dictionary personalized dictionary for word segmentation being upgraded according to minimum using priciple the earliest; After this word segmentation processing finished, control module triggered next word segmentation processing process to participle request module loopback word segmentation result and trigger message;
User's participle request through the participle request module be sent to simultaneously word-dividing mode based on personalized dictionary for word segmentation, based on the word-dividing mode of total participle dictionary, based on the word-dividing mode of personalized dictionary for word segmentation, based on the word-dividing mode of total participle dictionary process information is sent to control module, control module can be delivered to the participle request module with word segmentation result and trigger message again; Based on the word-dividing mode of personalized dictionary for word segmentation, based on connecting high speed word segmentation processing module between the word-dividing mode of total participle dictionary.
Personalized Parallel Word Segmentation disposal route sends to based on the word-dividing mode of personalized dictionary for word segmentation with based on the word-dividing mode of total participle dictionary simultaneously for user's participle request, if the word-dividing mode based on personalized dictionary for word segmentation is hit, then the word segmentation processing result is back to the participle request module by control module, interrupts simultaneously the participle request module to the participle request based on the word-dividing mode of total participle dictionary; Otherwise, according to the word segmentation processing result based on the word-dividing mode of total participle dictionary, personalized dictionary for word segmentation is dynamically updated according to minimum using priciple the earliest by control module.
Above-mentioned personalized Parallel Word Segmentation disposal route, concrete steps are as follows:
Step 1: judge in based on the word-dividing mode of personalized dictionary for word segmentation whether the user's query word that receives is present in the personalized dictionary for word segmentation, if exist, then goes to step 2, otherwise goes to step 3;
Step 2: upgrade the relevant informations such as the access frequency of the query word that hits in the personalized dictionary for word segmentation and recent visit time, go to step 4;
Step 3: will be sent to based on the resulting entry of word-dividing mode word segmentation processing of total participle dictionary word-dividing mode based on personalized dictionary for word segmentation by control module, and these entries will be added in the personalized dictionary for word segmentation, and its relevant information of initialization; If personalized dictionary for word segmentation is full, then eliminate some entry according to minimum using priciple the earliest;
Step 4: control module will be back to the participle request module based on the word-dividing mode of personalized dictionary for word segmentation or current word segmentation result and the next word segmentation processing start position information that feeds back based on the word-dividing mode of total participle dictionary, beginning lower whorl word segmentation processing process.
The dynamic updating method of above-mentioned personalized dictionary for word segmentation, concrete steps are as follows:
Step 1: judge whether entry is present in the personalized dictionary for word segmentation, if exist, then execution in step 2, otherwise execution in step 3;
Step 2: the corresponding access frequency of this entry in the personalized dictionary for word segmentation is increased by 1, execution in step 6;
Step 3: judge whether personalized dictionary for word segmentation capacity has reached threshold value, if then execution in step 4, otherwise execution in step 5;
Step 4: delete in the personalized dictionary for word segmentation access time the earliest and the minimum word of access frequency;
Step 5: entry is added into relevant position in the personalized dictionary for word segmentation, and its access frequency is initialized as 1, use simultaneously its recent visit time of current time initialization;
Step 6: be disposed, finish the renewal of an entry in personalized dictionary for word segmentation and process.
Described word-dividing mode based on total participle dictionary can be in the system of assurance participle accuracy rate, and for the renewal of personalized dictionary for word segmentation provides foundation, the employed total participle dictionary of this module comprises the required common wordss of participle for all users are shared.
The update strategy of described personalized dictionary for word segmentation has guaranteed that personalized dictionary for word segmentation stores the word of the highest and nearest inquiry of user's access frequency all the time, is convenient to extract user interest model, and the simultaneously excavation for the user interest migration provides reliable foundation.Simultaneously, because personalized dictionary for word segmentation is for different user, store its historical query word within a certain period, the content kept stable, if larger migration does not occur in the interest of user within this period, then the word-dividing mode based on personalized dictionary for word segmentation will possess high hit rate during word segmentation processing, therefore, based on the word segmentation processing efficient of this dictionary far above the word segmentation processing based on total participle dictionary; Total participle dictionary has been contained the needed common wordss information of all user's participles, its recall ratio is better than the word segmentation processing based on personalized dictionary for word segmentation, thereby, the two-way Parallel Word Segmentation processing mode that adopts the present invention to propose, both advantages have been taken into account, overcome again simultaneously both deficiencies, when guaranteeing the participle accuracy rate, effectively improved word segmentation processing efficient.
Claims (4)
1. personalized Parallel Word Segmentation disposal system is characterized in that: comprise the participle request module, based on the word-dividing mode of personalized dictionary for word segmentation, the word-dividing mode based on total participle dictionary, control module, high speed word segmentation processing module;
The participle request module be with user's query contents synchronously, parallel being sent to based on the word-dividing mode of personalized dictionary for word segmentation with based on the word-dividing mode of total participle dictionary carry out word segmentation processing, receive simultaneously the word segmentation result of control module loopback and begin the associated trigger information of next word segmentation processing;
Word-dividing mode based on personalized dictionary for word segmentation is that the substring of user's query contents and the entry in the personalized dictionary for word segmentation are mated, and finishes the word segmentation processing process;
Wherein, deposit the searching keyword of user within a period of time in the personalized dictionary for word segmentation, create when the user uses for the first time, increase gradually entry along with user's use, the entry number is few;
Word-dividing mode based on total participle dictionary is that the substring of user's query contents and the entry in total participle dictionary are mated, and finishes the word segmentation processing process;
Wherein, deposit all entries that all users share in total participle dictionary, information is perfect, and entry quantity is huge;
Control module is used for the processing procedure of synchronous two word-dividing mode, and after this word segmentation processing end, control module triggers next word segmentation processing process to participle request module loopback word segmentation result and trigger message;
User's participle request through the participle request module be sent to simultaneously word-dividing mode based on personalized dictionary for word segmentation, based on the word-dividing mode of total participle dictionary, based on the word-dividing mode of personalized dictionary for word segmentation, be sent to control module based on the process information of the word-dividing mode of total participle dictionary, control module is recycled to the participle request module with word segmentation result and trigger message again; Based on the word-dividing mode of personalized dictionary for word segmentation, based on connecting high speed word segmentation processing module between the word-dividing mode of total participle dictionary.
2. the disposal route of a kind of personalized Parallel Word Segmentation according to claim 1 system, it is characterized in that: user's participle request sends to simultaneously based on the word-dividing mode of personalized dictionary for word segmentation with based on the word-dividing mode of total participle dictionary, if the word-dividing mode based on personalized dictionary for word segmentation is hit, then the word segmentation processing result is back to the participle request module by control module, interrupts simultaneously the participle request module to the participle request based on the word-dividing mode of total participle dictionary; Otherwise, according to the word segmentation processing result based on the word-dividing mode of total participle dictionary, personalized dictionary for word segmentation is dynamically updated according to minimum using priciple the earliest by control module.
3. the disposal route of a kind of personalized Parallel Word Segmentation according to claim 1 and 2 system is characterized in that: described personalized Parallel Word Segmentation disposal route, and concrete steps are as follows:
Step 1: judge in based on the word-dividing mode of personalized dictionary for word segmentation whether the user's query word that receives is present in the personalized dictionary for word segmentation, if exist, then goes to step 2, otherwise goes to step 3;
Step 2: upgrade the relevant informations such as the access frequency of the query word that hits in the personalized dictionary for word segmentation and recent visit time, go to step 4;
Step 3: will be sent to based on the resulting entry of word-dividing mode word segmentation processing of total participle dictionary word-dividing mode based on personalized dictionary for word segmentation by control module, and these entries will be added in the personalized dictionary for word segmentation, and its relevant information of initialization; If personalized dictionary for word segmentation is full, then eliminate some entry according to minimum using priciple the earliest;
Step 4, control module will be back to the participle request module based on the word-dividing mode of personalized dictionary for word segmentation or current word segmentation result and the next word segmentation processing start position information that feeds back based on the word-dividing mode of total participle dictionary, beginning lower whorl word segmentation processing process.
4. the disposal route of a kind of personalized Parallel Word Segmentation according to claim 3 system is characterized in that: the dynamic updating method of described personalized dictionary for word segmentation, and concrete steps are as follows:
Step 1 judges whether entry is present in the personalized dictionary for word segmentation, if exist, then execution in step 2, otherwise execution in step 3;
Step 2 increases by 1, execution in step 6 with the corresponding access frequency of this entry in the personalized dictionary for word segmentation;
Step 3 judges whether personalized dictionary for word segmentation capacity has reached threshold value, if then execution in step 4, otherwise execution in step 5;
Step 4 is deleted in the personalized dictionary for word segmentation access time the earliest and the minimum word of access frequency;
Step 5 is added into relevant position in the personalized dictionary for word segmentation with entry, and its access frequency is initialized as 1, uses simultaneously its recent visit time of current time initialization;
Step 6 is disposed, and finishes the renewal of an entry in personalized dictionary for word segmentation and processes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210435504.7A CN102982099B (en) | 2012-11-05 | 2012-11-05 | A kind of personalized Parallel Word Segmentation disposal system and disposal route thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210435504.7A CN102982099B (en) | 2012-11-05 | 2012-11-05 | A kind of personalized Parallel Word Segmentation disposal system and disposal route thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102982099A true CN102982099A (en) | 2013-03-20 |
CN102982099B CN102982099B (en) | 2015-11-11 |
Family
ID=47856117
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210435504.7A Active CN102982099B (en) | 2012-11-05 | 2012-11-05 | A kind of personalized Parallel Word Segmentation disposal system and disposal route thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102982099B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104699763A (en) * | 2015-02-11 | 2015-06-10 | 中国科学院新疆理化技术研究所 | Text similarity measuring system based on multi-feature fusion |
CN104881403A (en) * | 2015-06-04 | 2015-09-02 | 百度在线网络技术(北京)有限公司 | Word segmentation method and device |
CN106776746A (en) * | 2016-11-14 | 2017-05-31 | 天津南大通用数据技术股份有限公司 | A kind of creation method and device of full-text index data |
CN108984512A (en) * | 2017-06-05 | 2018-12-11 | 中移信息技术有限公司 | A kind of segmenting method and device of text |
CN109800412A (en) * | 2018-12-10 | 2019-05-24 | 鲁东大学 | A kind of Chinese word segmentation and big data information retrieval method and device |
CN109918665A (en) * | 2019-03-05 | 2019-06-21 | 湖北亿咖通科技有限公司 | Segmenting method, device and the electronic equipment of text |
CN110619122A (en) * | 2019-09-19 | 2019-12-27 | 中国联合网络通信集团有限公司 | Word segmentation processing method, device and equipment and computer readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1959674A (en) * | 2006-11-09 | 2007-05-09 | 华为技术有限公司 | Network search method, network search device, and user terminals |
CN101561818A (en) * | 2009-05-13 | 2009-10-21 | 北京用友移动商务科技有限公司 | Method for word segmentation processing and method for full-text retrieval |
CN101923545A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for recommending personalized information |
-
2012
- 2012-11-05 CN CN201210435504.7A patent/CN102982099B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1959674A (en) * | 2006-11-09 | 2007-05-09 | 华为技术有限公司 | Network search method, network search device, and user terminals |
CN101561818A (en) * | 2009-05-13 | 2009-10-21 | 北京用友移动商务科技有限公司 | Method for word segmentation processing and method for full-text retrieval |
CN101923545A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for recommending personalized information |
Non-Patent Citations (1)
Title |
---|
褚锋: "基于个性化搜索的网页特征提取相关技术的研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》, no. 1, 15 December 2011 (2011-12-15) * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104699763A (en) * | 2015-02-11 | 2015-06-10 | 中国科学院新疆理化技术研究所 | Text similarity measuring system based on multi-feature fusion |
CN104699763B (en) * | 2015-02-11 | 2017-10-17 | 中国科学院新疆理化技术研究所 | The text similarity gauging system of multiple features fusion |
CN104881403A (en) * | 2015-06-04 | 2015-09-02 | 百度在线网络技术(北京)有限公司 | Word segmentation method and device |
CN106776746A (en) * | 2016-11-14 | 2017-05-31 | 天津南大通用数据技术股份有限公司 | A kind of creation method and device of full-text index data |
CN108984512A (en) * | 2017-06-05 | 2018-12-11 | 中移信息技术有限公司 | A kind of segmenting method and device of text |
CN109800412A (en) * | 2018-12-10 | 2019-05-24 | 鲁东大学 | A kind of Chinese word segmentation and big data information retrieval method and device |
CN109918665A (en) * | 2019-03-05 | 2019-06-21 | 湖北亿咖通科技有限公司 | Segmenting method, device and the electronic equipment of text |
CN109918665B (en) * | 2019-03-05 | 2021-11-02 | 湖北亿咖通科技有限公司 | Word segmentation method and device for text and electronic equipment |
CN110619122A (en) * | 2019-09-19 | 2019-12-27 | 中国联合网络通信集团有限公司 | Word segmentation processing method, device and equipment and computer readable storage medium |
CN110619122B (en) * | 2019-09-19 | 2023-08-22 | 中国联合网络通信集团有限公司 | Word segmentation processing method, device, equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN102982099B (en) | 2015-11-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102982099B (en) | A kind of personalized Parallel Word Segmentation disposal system and disposal route thereof | |
CN106528599B (en) | A kind of character string Fast Fuzzy matching algorithm in magnanimity audio data | |
CN100462979C (en) | Distributed indesx file searching method, searching system and searching server | |
CN102184222B (en) | Quick searching method in large data volume storage | |
CN107710201B (en) | Storing and retrieving data from a bit vector search index | |
US10565198B2 (en) | Bit vector search index using shards | |
CN106528846B (en) | A kind of search method and device | |
CN104778270A (en) | Storage method for multiple files | |
CN101996195A (en) | Searching method and device of voice information in audio files and equipment | |
US11748324B2 (en) | Reducing matching documents for a search query | |
CN102184256A (en) | Clustering method and system aiming at massive similar short texts | |
CN105740472A (en) | Distributed real-time full-text search method and system | |
CN104391908B (en) | Multiple key indexing means based on local sensitivity Hash on a kind of figure | |
US20160378805A1 (en) | Matching documents using a bit vector search index | |
CN103064842B (en) | Information subscribing treating apparatus and information subscribing disposal route | |
US20220358178A1 (en) | Data query method, electronic device, and storage medium | |
CN107943919B (en) | A kind of enquiry expanding method of session-oriented formula entity search | |
CN105740445A (en) | Database query method and device | |
CN105653697B (en) | Recommended word retrieval method and system | |
CN107038225A (en) | The search method of information intelligent retrieval system | |
CN102411568A (en) | Chinese word segmentation method based on travel industry feature word stock | |
CN101650742A (en) | System and method for prompting search condition during English search | |
CN102063454A (en) | Method and equipment combining search and application | |
CN109739885A (en) | Data query method, apparatus, equipment and storage medium based on local cache | |
CN103064846B (en) | Retrieval device and search method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20130320 Assignee: Xi'an Tianyu Xinghua Information Technology Co.,Ltd. Assignor: XI'AN University OF POSTS & TELECOMMUNICATIONS Contract record no.: X2022980018057 Denomination of invention: A personalized parallel word segmentation processing system and its processing method Granted publication date: 20151111 License type: Common License Record date: 20221012 |
|
EE01 | Entry into force of recordation of patent licensing contract |