CN107220390A - A kind of method and device for creating Chinese index - Google Patents

A kind of method and device for creating Chinese index Download PDF

Info

Publication number
CN107220390A
CN107220390A CN201710616016.9A CN201710616016A CN107220390A CN 107220390 A CN107220390 A CN 107220390A CN 201710616016 A CN201710616016 A CN 201710616016A CN 107220390 A CN107220390 A CN 107220390A
Authority
CN
China
Prior art keywords
chinese
character string
letter
character
pinyin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710616016.9A
Other languages
Chinese (zh)
Inventor
丛锐
谢恩鹏
刘磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Business System Co Ltd
Original Assignee
Shandong Inspur Business System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Business System Co Ltd filed Critical Shandong Inspur Business System Co Ltd
Priority to CN201710616016.9A priority Critical patent/CN107220390A/en
Publication of CN107220390A publication Critical patent/CN107220390A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures

Abstract

The invention provides a kind of method and device for creating Chinese index, this method includes:Obtain the first letter of pinyin of each Chinese character in Chinese;At least one character string started with first letter of pinyin each described is obtained respectively, wherein, corresponding two Chinese characters of the adjacent first letter of pinyin of any two are adjacent in the Chinese in the character string, and order of the order with correspondence Chinese character in the Chinese of the first letter of pinyin of each in the character string is identical;Each the described character string got is associated with the Chinese, it is used as the index of the Chinese.The device includes:Acquiring unit, cutting unit and associative cell.This programme can improve experience when user retrieves to Chinese.

Description

A kind of method and device for creating Chinese index
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of method and device for creating Chinese index.
Background technology
With computer technology continuing to develop with it is progressive, various smart machines are widely used in living, produce Every field.User frequently encounters the situation for needing to be retrieved to Chinese during using smart machine, than Such as, user in TV programme Chinese using Intelligent set top box program request, it is necessary to by remote control to be watched TV programme Chinese is scanned for.But, under some special scenes, user can not be examined by inputting Chinese to Chinese Rope, such as be difficult to input Chinese by remote control, it is therefore desirable to create index for Chinese, Chinese carried out with facilitating Retrieval.
At present, when creating index for Chinese, the first letter of pinyin of each Chinese character included by Chinese, shape are obtained Into in including Chinese at least one character string of the first letter of pinyin of first Chinese character as Chinese index, with reality Now Chinese memory is retrieved by phonetic.For example, Chinese includes 3 Chinese characters, the first letter of pinyin of 3 Chinese characters is successively For h, l and s, then using h, hl and hls as the Chinese index.
Method for creating index for Chinese at present, because the index of Chinese is included the in Chinese The first letter of pinyin of one Chinese character, user to Chinese when retrieving, it is necessary to knows first Chinese of Chinese Word, otherwise can not be successfully retrieved Chinese.Therefore, the existing method that index is created for Chinese, is caused in user couple Experience when literary fame claims to be retrieved is poor.
The content of the invention
The embodiments of the invention provide a kind of method and device for creating Chinese index, it is possible to increase user is to Chinese Experience when title is retrieved.
In a first aspect, the embodiments of the invention provide a kind of method for creating Chinese index, including:
Obtain the first letter of pinyin of each Chinese character in Chinese;
At least one character string started with first letter of pinyin each described is obtained respectively, wherein, in the character string Corresponding two Chinese characters of the adjacent first letter of pinyin of any two are adjacent in the Chinese, and each in the character string Order of the order of the individual first letter of pinyin with correspondence Chinese character in the Chinese is identical;
Each the described character string got is associated with the Chinese, it is used as the index of the Chinese.
Alternatively, it is described to obtain to spell each described respectively when the Chinese includes at least two Chinese characters At least one character string of sound initial beginning, including:
According to order of the correspondence Chinese character in the Chinese, sequential combination is carried out to first letter of pinyin each described, Form character string to be slit;
Cutting at least one times is carried out to the character string to be slit with different dicing positions and cutting number of times, obtained at least Two character strings, wherein, each described character string includes at least one described first letter of pinyin.
Alternatively, it is described that the character string to be slit is cut at least one times with different dicing positions and cutting number of times Point, including:
By recursive algorithm, using the character string to be slit as initial input, the character string to be slit is cut Partite transport is calculated, the character string of at least two operation results that cutting computing each time is exported as respective amount.
Alternatively, after the formation character string to be slit, further comprise:
The character string to be slit is associated with the Chinese, it is used as an index of the Chinese.
Alternatively, the first letter of pinyin for obtaining each Chinese character in Chinese, including:
By pre-defined Java archive file jar bags, each Chinese character that the Chinese includes is turned over respectively It is translated into corresponding phonetic spelling;
For Chinese character each described, by the jar bags, first letter is extracted from Chinese character correspondence phonetic spelling It is used as the first letter of pinyin of the Chinese character.
Second aspect, the embodiment of the present invention additionally provides a kind of device for creating Chinese index, including:Obtain single Member, cutting unit and associative cell;
The acquiring unit, the first letter of pinyin for obtaining the Chinese character of each in Chinese;
The cutting unit, each described first letter of pinyin got for obtaining respectively with the acquiring unit is opened At least one character string of head, wherein, corresponding two Chinese characters of the adjacent first letter of pinyin of any two in the character string It is adjacent in the Chinese, and the first letter of pinyin of each in the character string order with corresponding Chinese character in described Order during literary fame claims is identical;
The associative cell, for each described character string and the Chinese phase for getting the cutting unit Association, is used as the index of the Chinese.
Alternatively, when the Chinese includes at least two Chinese characters,
The cutting unit includes:Combine subelement and cutting subelement;
The combination subelement, for the order according to correspondence Chinese character in the Chinese, to phonetic each described Initial carries out sequential combination, forms character string to be slit;
The cutting subelement, for the institute formed with different dicing positions and cutting number of times to the combination subelement State character string to be slit and carry out cutting at least one times, obtain character string described at least two, wherein, each described character string bag At least one described first letter of pinyin is included.
Alternatively,
The cutting subelement, for by recursive algorithm, using the character string to be slit as initial input, to described Character string to be slit carries out cutting computing, and at least two operation results that cutting computing each time is exported are used as respective amount The character string.
Alternatively,
The associative cell, is further used for the character string to be slit is associated with the Chinese, is used as institute State an index of Chinese.
Alternatively,
The acquiring unit includes:Translate subelement and extract subelement;
The translation subelement, for the jar bags by pre-defining, each for respectively including the Chinese Chinese character translation is corresponding phonetic spelling;
The extraction subelement, for for Chinese character each described, by the jar bags, from Chinese character correspondence phonetic First letter is extracted in spelling as the first letter of pinyin of the Chinese character.
The method and device provided in an embodiment of the present invention for creating Chinese index, obtains each Chinese in Chinese After the first letter of pinyin of word, for each first letter of pinyin, at least one character string started with the first letter of pinyin is obtained, So that corresponding two Chinese characters of the adjacent first letter of pinyin of any two are connected in Chinese in each character string, and it is each Order of the order of each first letter of pinyin with correspondence Chinese character in Chinese is identical in individual character string, by each got After character string is associated with Chinese, each character string as Chinese index.As can be seen here, will be with every in Chinese The first letter of pinyin of one Chinese character for beginning character string as the index of Chinese, user do not knowing the of Chinese During one Chinese character, Chinese can be retrieved by the first letter of pinyin of one or more Chinese characters behind Chinese, The purpose retrieved to Chinese is reached, so as to improve experience when user retrieves to Chinese.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are the present invention Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis These accompanying drawings obtain other accompanying drawings.
Fig. 1 is a kind of method flow diagram for establishment Chinese index that one embodiment of the invention is provided;
Fig. 2 is the method flow diagram for another establishment Chinese index that one embodiment of the invention is provided;
Fig. 3 is the signal of equipment where a kind of device for establishment Chinese index that one embodiment of the invention is provided Figure;
Fig. 4 is a kind of schematic device for establishment Chinese index that one embodiment of the invention is provided;
Fig. 5 is the schematic device for another establishment Chinese index that one embodiment of the invention is provided;
Fig. 6 is the schematic device for another establishment Chinese index that one embodiment of the invention is provided.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments, based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained on the premise of creative work is not made, belongs to the scope of protection of the invention.
As shown in figure 1, the embodiments of the invention provide a kind of method for creating Chinese index, this method can include Following steps:
Step 101:Obtain the first letter of pinyin of each Chinese character in Chinese;
Step 102:At least one character string started with first letter of pinyin each described is obtained respectively, wherein, it is described Corresponding two Chinese characters of the adjacent first letter of pinyin of any two are adjacent in the Chinese in character string, and the word Order of the order of each first letter of pinyin with correspondence Chinese character in the Chinese is identical in symbol string;
Step 103:Each the described character string got is associated with the Chinese, it is used as the Chinese Index.
The embodiments of the invention provide a kind of method for creating Chinese index, each Chinese character in Chinese is obtained First letter of pinyin after, for each first letter of pinyin, obtain at least one character string started with the first letter of pinyin, make Corresponding two Chinese characters of the adjacent first letter of pinyin of any two in each character string are obtained in Chinese to be connected, and each Order of the order of the first letter of pinyin of each in character string with correspondence Chinese character in Chinese is identical, by each word got Symbol string associated with Chinese after, each character string as Chinese index.As can be seen here, will be with each in Chinese The first letter of pinyin of individual Chinese character for beginning character string as the index of Chinese, user do not knowing the first of Chinese During Chinese character, Chinese can be retrieved by the first letter of pinyin of one or more Chinese characters behind Chinese, reached To the purpose retrieved to Chinese, so as to improve experience when user retrieves to Chinese.
For example, Chinese is " the first and second the third ", the first letter of pinyin of Chinese character " first " is j, and the phonetic of Chinese character " second " is first Letter is y, and the first letter of pinyin of Chinese character " third " is b, obtains respectively by the character string of beginning of j, y and b and is used as Chinese " first The index of second the third ", such as index include j, yb and b.
Alternatively, as shown in figure 1,
When including at least two Chinese characters in Chinese, obtained respectively with each first letter of pinyin in step 102 The process of at least one character string of beginning, can specifically be realized in the following way:
According to order of the correspondence Chinese character in Chinese, sequential combination is carried out to each first letter of pinyin, one is formed Character string to be slit;At least one cutting is carried out to character string to be slit with different dicing positions and cutting number of times, each time Cutting all obtains at least two character strings, and ensures that each character string includes at least one first letter of pinyin.
For example, according to the order of 3 Chinese characters in Chinese " the first and second the third ", by first letter of pinyin j, y and b carry out order group Close, form character string jyb to be slit.Character string jyb to be slit is repeatedly cut with different dicing positions and cutting number of times After point, j, y, b, jy, yb can be obtained and amount to 5 character strings.
By the combination of different dicing positions and cutting number of times, to the character string to be slit being made up of each first letter of pinyin Carry out after multiple cutting, the various character strings formed by one or more first letter of pinyin sequential combinations can be obtained, obtained Each character string cover the combining form of the first letter of pinyin of any number adjacent Chinese characters in Chinese.Cutting is obtained Each character string as the index of Chinese after, user can be by the phonetic lead-in of any one Chinese character in Chinese The first letter of pinyin of female or any number of adjacent Chinese characters combines to retrieve Chinese.On the one hand, user can pass through The first letter of pinyin of any number and optional position Chinese character combines to retrieve Chinese in Chinese, improves use The convenience that family is retrieved to Chinese;On the other hand, it is that Chinese creates multiple indexes, user is any one in input It is individual to index after corresponding pinyin character string, it can rapidly retrieve the corresponding one or more Chinese names of the pinyin character string Claim, the mode of Relative Fuzzy search can improve the speed retrieved to Chinese.
Alternatively, when carrying out cutting to character string to be slit with different dicing positions and cutting number of times, it can pass through Recursive algorithm, using character string to be slit as initial input, cutting computing is carried out to character string to be slit, will the fortune of cutting each time At least two operation results for calculating output are used as the character string for index.
For example, using character string jyb to be slit as initial input, being carried out by recursive algorithm to character string jyb to be slit Cutting computing.First time cutting computing output string jy and character string yb, using the character string jy and character string yb of output as Two indexes of Chinese " the first and second the third ";Second of cutting computing respectively using character string jy and character string yb as input, with Character string jy is used as output string y during input and character as output string j and character string y during input using character string yb Go here and there b, using the character string j, character string y and character string b of output as Chinese " the first and second the third " 3 indexes.
Cutting is carried out to character string to be slit by recursive algorithm, quickly and accurately character string to be slit can be carried out Cutting, it is ensured that the character string that cutting computing is formed covers the various sequential combination forms of first letter of pinyin, and then ensures user couple Convenience and speed that Chinese is retrieved.
Alternatively, except the character string carried out to character string to be slit obtained by cutting is associated with Chinese, make Outside for the index of Chinese, also character string to be slit is associated with Chinese in itself, one of Chinese is used as Index.
On the one hand, because user to Chinese when retrieving, each Chinese included by Chinese may be inputted The first letter of pinyin of word, after character string to be slit is indexed as one of Chinese, it is ensured that user is in input Chinese During the first letter of pinyin of each included Chinese character, Chinese can be retrieved;On the other hand, using character string to be slit in The index that literary fame claims, because character string to be slit reflects each Chinese character included by Chinese, with the word to be slit Symbol string is relatively fewer as the number of the Chinese indexed, thus user is by inputting character string to be slit when being retrieved, The negligible amounts of the Chinese retrieved so that user can rapidly find out required from each Chinese retrieved Chinese, further improves experience during user search Chinese.
Alternatively, as shown in figure 1,
When step 101 obtains the first letter of pinyin of each Chinese character in Chinese, can by the jar bags that are pre-created, Each Chinese character translation that Chinese is included respectively, then for each Chinese character, passes through jar into corresponding phonetic spelling Wrap and first letter is extracted from the corresponding phonetic spelling of the Chinese character as the first letter of pinyin of the Chinese character.
For example, by the jar bags being pre-created, " first " in Chinese " the first and second the third " is translated as phonetic spelling jia, " second " is translated as phonetic spelling yi, " third " is translated into phonetic spelling bing.Then by again by jar bags, from phonetic In spelling jia extract j as " first " first letter of pinyin, from phonetic spelling yi extract y as " second " first letter of pinyin, from B is extracted in phonetic spelling bing as the first letter of pinyin of " third ".
A jar bag is pre-defined, when needing to create index to Chinese, jar bags can be called to realize in acquisition The first letter of pinyin of the Chinese character of each during literary fame claims, so as to easily obtain the first letter of pinyin of each Chinese character in Chinese. Further, since jar bags can be introduced directly into corresponding project, using the method in jar bags, so as to by the jar bags of definition Different Chinese index creation systems are introduced, the workload of development is reduced.
Exemplified by the Chinese of demand TV program is indexed establishment in Intelligent set top box below, to of the invention real Apply example offer establishments Chinese index method be described in further detail, as shown in Fig. 2 this method can include it is following Step:
Step 201:Obtain the Chinese for needing to create index.
In an embodiment of the invention, Intelligent set top box is added after new demand TV program, it is necessary to newly increase The Chinese of demand TV program creates index, so that user can be by retrieving the point after the letter of remote control importation Broadcast TV programme.Newly-increased added some points to newly increase before the Chinese of demand TV program creates index, first having to obtain this Broadcast the Chinese of TV programme.
For example, obtaining the Chinese " the first and second the third " for newly increasing demand TV program A.
Step 202:Each Chinese character translation for being included Chinese by pre-defined jar bags is complete for corresponding phonetic Spell.
In an embodiment of the invention, the jar bags existed are pre-defined or obtain, will be newly-increased by the jar bags Plus each Chinese character translation that the Chinese of demand TV program includes is corresponding phonetic spelling.
For example, by pre-defined jar bags, by the Chinese " the first and second the third " for newly increasing demand TV program A Chinese character " first " is translated as phonetic spelling jia, and " second " is translated as into phonetic spelling yi, " third " is translated into phonetic spelling bing.
Step 203:The first letter of pinyin of each Chinese character is extracted from the phonetic spelling of each Chinese character by jar bags.
In an embodiment of the invention, after the phonetic spelling of the Chinese character of each in getting Chinese, lead to again Jar bags are crossed, the first letter of pinyin of the Chinese character is extracted from the phonetic spelling of each Chinese character respectively, so as to obtain in Chinese The first letter of pinyin of each Chinese character.
For example, extracting j from phonetic spelling jia as the first letter of pinyin of " first ", y conducts are extracted from phonetic spelling yi The first letter of pinyin of " second ", extracts b as the first letter of pinyin of " third " from phonetic spelling bing.
Step 204:The first letter of pinyin of each Chinese character is combined, character string to be slit is formed.
In an embodiment of the invention, after the first letter of pinyin of the Chinese character of each in getting Chinese, according to right Order of the Chinese character in Chinese is answered, sequential combination is carried out to the first letter of pinyin of each Chinese character, character string to be slit is formed.
For example, the sequence according to Chinese character " first ", " second " and " third " in Chinese " the first and second the third ", by Chinese character " first ", " second " and " third " corresponding first letter of pinyin j, y and b is combined into character string jyb to be slit.
Step 205:Cutting computing is carried out to character string to be slit by recursive algorithm, at least one character string is obtained.
In an embodiment of the invention, get that the first letter of pinyin of each Chinese character in Chinese constitutes is to be cut Divide after character string, using character string to be slit as initial input, cutting computing carried out to character string to be slit by recursive algorithm, Obtain each character string of output.
For example, using character string jyb to be slit as initial input, being carried out by recursive algorithm to character string jyb to be slit Cutting computing, computing output string jy, character string yb, character string j, character string y and character string b.
Step 206:Each character string and character string to be slit is associated with Chinese, it is used as the rope of Chinese Draw.
In an embodiment of the invention, after cutting acquisition character string is carried out to character string to be slit, by each of acquisition Individual character string and character string to be slit are associated with Chinese, regard each character string and character string to be slit as Chinese The index of title, and the Chinese and the incidence relation of character string and character string to be slit of newly increasing demand TV program are deposited Store up in Intelligent set top box, when the user is inputted any one in each character string and character string to be slit by remote control, This can be retrieved and newly increase demand TV program.
For example by character string jy, character string yb, character string j, character string y and character string b with newly increasing demand TV program A Chinese " the first and second the third " it is associated, and by character string jyb to be slit and the Chinese for newly increasing demand TV program A " the first and second the third " are associated, and character string jy, character string yb, character string j, character string y, character string b and character string jyb to be slit are made For the index of Chinese " the first and second the third ".Appoint when user is sent in jy, yb, j, y, b and jyb by remote control to Intelligent set top box During a character string of anticipating, Intelligent set top box can be indexed by each of Chinese " the first and second the third ", retrieve Chinese " first Second the third ", and then newly increase demand TV program A to user's displaying.
In an embodiment of the invention, when carrying out cutting computing to character string to be slit by recursive algorithm, Ke Yitong Following script is crossed to realize:
As shown in Figure 3, Figure 4, the embodiments of the invention provide a kind of device for creating Chinese index.Device embodiment It can be realized, can also be realized by way of hardware or software and hardware combining by software.For hardware view, such as Fig. 3 It is shown, it is a kind of hardware structure diagram of equipment where the device of establishment Chinese index provided in an embodiment of the present invention, except Outside processor, internal memory, network interface and nonvolatile memory shown in Fig. 3, the equipment in embodiment where device is led to Other hardware can also often be included, be such as responsible for the forwarding chip of processing message.Exemplified by implemented in software, as shown in figure 4, making It is by corresponding computer journey in nonvolatile memory by the CPU of equipment where it for the device on a logical meaning Sequence instruction reads what operation in internal memory was formed.The device for creating Chinese index that the present embodiment is provided, including:Obtain single Member 401, cutting unit 402 and associative cell 403;
Acquiring unit 401, the first letter of pinyin for obtaining the Chinese character of each in Chinese;
Cutting unit 402, starts extremely for obtaining each first letter of pinyin got with acquiring unit 401 respectively A few character string, wherein, the corresponding two Chinese characters phase in Chinese of the adjacent first letter of pinyin of any two in character string Order of the order of the first letter of pinyin of each in neighbour, and character string with correspondence Chinese character in Chinese is identical;
Associative cell 403, each character string for cutting unit 402 to be got is associated with Chinese, as The index of Chinese.
Alternatively, as shown in figure 5,
When Chinese includes at least two Chinese characters, cutting unit 402 includes:Combine subelement 4021 and cutting Unit 4022;
Subelement 4021 is combined, for the order according to correspondence Chinese character in Chinese, each first letter of pinyin is entered Row sequential combination, forms character string to be slit;
Cutting subelement 4022, for being treated with different dicing positions and cutting number of times to the combination formation of subelement 4021 Cutting character string carries out cutting at least one times, obtains at least two character strings, wherein, each character string includes at least one First letter of pinyin.
Alternatively, as shown in figure 5,
Cutting subelement 4022, for by recursive algorithm, using character string to be slit as initial input, to word to be slit Symbol string carries out cutting computing, character string of at least two operation results that cutting computing each time is exported as respective amount.
Alternatively, as shown in figure 5,
Associative cell 403, is further used for character string to be slit is associated with Chinese, is used as the one of Chinese Individual index.
Alternatively, as shown in fig. 6,
Acquiring unit 401 includes:Translate subelement 4011 and extract subelement 4012;
Subelement 4011 is translated, for the jar bags by pre-defining, each Chinese character for respectively including Chinese It is translated as corresponding phonetic spelling;
Subelement 4012 is extracted, for for each Chinese character, by jar bags, being carried from Chinese character correspondence phonetic spelling First letter is taken as the first letter of pinyin of the Chinese character.
It should be noted that the content such as information exchange, implementation procedure between each unit in said apparatus, due to this Inventive method embodiment is based on same design, and particular content can be found in the narration in the inventive method embodiment, no longer goes to live in the household of one's in-laws on getting married herein State.
The embodiment of the present invention additionally provides the execute instruction that is stored with a kind of computer-readable recording medium, computer-readable recording medium, when storage control Described in the computing device of device during execute instruction, the storage control is performed in the establishment that any one foregoing embodiment is provided The method of literary name index.
The embodiment of the present invention additionally provides a kind of storage control, including:Processor, memory and bus;
The memory is used to store execute instruction, and the processor is connected with the memory by the bus, when During the storage control operation, any one foregoing embodiment of the execution of memory storage described in the computing device The method for creating Chinese index of offer.
In summary, the method and device for creating Chinese index that each embodiment of the invention is provided, at least has Following beneficial effect:
1st, in embodiments of the present invention, obtain in Chinese after the first letter of pinyin of each Chinese character, for each First letter of pinyin, obtains at least one character string started with the first letter of pinyin so that any two in each character string Corresponding two Chinese characters of adjacent first letter of pinyin are connected in Chinese, and each first letter of pinyin in each character string Order of the order with correspondence Chinese character in Chinese is identical, after each character string got is associated with Chinese, respectively Individual character string as Chinese index.As can be seen here, by the first letter of pinyin of each Chinese character using in Chinese to open The character string of head is as the index of Chinese, and user can pass through Chinese name when not knowing the first Chinese character of Chinese Claim the first letter of pinyin of one or more Chinese characters below to retrieve Chinese, reach what Chinese was retrieved Purpose, so as to improve experience when user retrieves to Chinese.
2nd, in embodiments of the present invention, in the character string that acquisition is indexed as Chinese, first by Chinese The first letter of pinyin of each Chinese character carries out sequential combination and forms character string to be slit, then with different dicing positions and cutting time It is several that multiple cutting is carried out to character string to be slit, multiple character strings are obtained, each character string obtained covers Chinese The combining form of the first letter of pinyin of middle any number adjacent Chinese characters.So, user can pass through any number in Chinese Combine to retrieve Chinese with the first letter of pinyin of optional position Chinese character, improve user and Chinese is examined The convenience of rope.
3rd, in embodiments of the present invention, to character string to be slit carry out cutting when, can using character string to be slit as Initial input, cutting computing is carried out by recursive algorithm to character string to be slit.So, can be quickly and accurately to be slit Character string carries out cutting, it is ensured that the character string that cutting computing is formed covers the various sequential combination forms of first letter of pinyin, and then Ensure convenience and speed that user is retrieved to Chinese.
4th, in embodiments of the present invention, except by the character string and Chinese obtained by character string to be slit progress cutting Be associated, as the index of Chinese outside, also character string to be slit is associated with Chinese in itself, Chinese is used as One index of title.So, for the first letter of pinyin of each Chinese character included by Chinese can be inputted, to Chinese name Claim to carry out more quickly retrieval, improve the speed retrieved to Chinese.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity Or operation makes a distinction with another entity or operation, and not necessarily require or imply exist between these entities or operation Any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non- It is exclusive to include, so that process, method, article or equipment including a series of key elements not only include those key elements, But also other key elements including being not expressly set out, or also include solid by this process, method, article or equipment Some key elements.In the absence of more restrictions, by sentence " including the key element that a 〃 〃 " is limited, it is not excluded that Also there is other identical factor in the process including the key element, method, article or equipment.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through Programmed instruction related hardware is completed, and foregoing program can be stored in the storage medium of embodied on computer readable, the program Upon execution, the step of including above method embodiment is performed;And foregoing storage medium includes:ROM, RAM, magnetic disc or light Disk etc. is various can be with the medium of store program codes.
It is last it should be noted that:Presently preferred embodiments of the present invention is the foregoing is only, the skill of the present invention is merely to illustrate Art scheme, is not intended to limit the scope of the present invention.Any modification for being made within the spirit and principles of the invention, Equivalent substitution, improvement etc., are all contained in protection scope of the present invention.

Claims (10)

1. a kind of method for creating Chinese index, it is characterised in that including:
Obtain the first letter of pinyin of each Chinese character in Chinese;
At least one character string started with first letter of pinyin each described is obtained respectively, wherein, it is any in the character string Corresponding two Chinese characters of two adjacent first letter of pinyin are adjacent in the Chinese, and the institute of each in the character string State order of the order of first letter of pinyin with correspondence Chinese character in the Chinese identical;
Each the described character string got is associated with the Chinese, it is used as the index of the Chinese.
2. according to the method described in claim 1, it is characterised in that when the Chinese includes at least two Chinese characters, It is described to obtain respectively with least one character string of first letter of pinyin beginning each described, including:
According to order of the correspondence Chinese character in the Chinese, sequential combination is carried out to first letter of pinyin each described, formed Character string to be slit;
Cutting at least one times is carried out to the character string to be slit with different dicing positions and cutting number of times, at least two are obtained The character string, wherein, each described character string includes at least one described first letter of pinyin.
3. method according to claim 2, it is characterised in that it is described with different dicing positions and cutting number of times to described Character string to be slit carries out cutting at least one times, including:
By recursive algorithm, using the character string to be slit as initial input, cutting fortune is carried out to the character string to be slit Calculate, the character string of at least two operation results that cutting computing each time is exported as respective amount.
4. method according to claim 2, it is characterised in that after the formation character string to be slit, further wrap Include:
The character string to be slit is associated with the Chinese, it is used as an index of the Chinese.
5. according to any described method in Claims 1-4, it is characterised in that each Chinese character in the acquisition Chinese First letter of pinyin, including:
By pre-defined Java archive file jar bags, it is by each Chinese character translation that the Chinese includes respectively Corresponding phonetic spelling;
For Chinese character each described, by the jar bags, first letter conduct is extracted from Chinese character correspondence phonetic spelling The first letter of pinyin of the Chinese character.
6. a kind of device for creating Chinese index, it is characterised in that including:Acquiring unit, cutting unit and associative cell;
The acquiring unit, the first letter of pinyin for obtaining the Chinese character of each in Chinese;
The cutting unit, for obtaining each the first letter of pinyin beginning got with the acquiring unit respectively At least one character string, wherein, corresponding two Chinese characters of the adjacent first letter of pinyin of any two are in institute in the character string State it is adjacent in Chinese, and the first letter of pinyin of each in the character string order with corresponding Chinese character in the Chinese name Order in title is identical;
The associative cell, each described character string for the cutting unit to be got is related to the Chinese Connection, is used as the index of the Chinese.
7. device according to claim 6, it is characterised in that when the Chinese includes at least two Chinese characters,
The cutting unit includes:Combine subelement and cutting subelement;
The combination subelement, for the order according to correspondence Chinese character in the Chinese, to phonetic lead-in each described Mother carries out sequential combination, forms character string to be slit;
The cutting subelement, for being treated with different dicing positions and cutting number of times to described combine described in subelement formation Cutting character string carries out cutting at least one times, obtains character string described at least two, wherein, each described character string includes At least one described first letter of pinyin.
8. device according to claim 7, it is characterised in that
The cutting subelement, for by recursive algorithm, using the character string to be slit as initial input, to described to be cut Character string is divided to carry out cutting computing, at least two operation results that cutting computing each time is exported are as described in respective amount Character string.
9. device according to claim 7, it is characterised in that
The associative cell, be further used for the character string to be slit is associated with the Chinese, in described The index that literary fame claims.
10. according to any described device in claim 6 to 9, it is characterised in that
The acquiring unit includes:Translate subelement and extract subelement;
The translation subelement, for the jar bags by pre-defining, each Chinese character for respectively including the Chinese It is translated as corresponding phonetic spelling;
The extraction subelement, for for Chinese character each described, by the jar bags, from Chinese character correspondence phonetic spelling The middle first letter that extracts is as the first letter of pinyin of the Chinese character.
CN201710616016.9A 2017-07-26 2017-07-26 A kind of method and device for creating Chinese index Pending CN107220390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710616016.9A CN107220390A (en) 2017-07-26 2017-07-26 A kind of method and device for creating Chinese index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710616016.9A CN107220390A (en) 2017-07-26 2017-07-26 A kind of method and device for creating Chinese index

Publications (1)

Publication Number Publication Date
CN107220390A true CN107220390A (en) 2017-09-29

Family

ID=59954227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710616016.9A Pending CN107220390A (en) 2017-07-26 2017-07-26 A kind of method and device for creating Chinese index

Country Status (1)

Country Link
CN (1) CN107220390A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101162146A (en) * 2007-02-01 2008-04-16 厦门雅迅网络股份有限公司 Method for searching interest points according to the first letter of phonation in networking vehicle mounted guidance apparatus
CN101566989A (en) * 2008-04-23 2009-10-28 中国电信股份有限公司 Method and system for converting information querying codes into Chinese characters
CN103838876A (en) * 2014-03-27 2014-06-04 烽火通信科技股份有限公司 Method for retrieving document through pinyin and document retrieval system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101162146A (en) * 2007-02-01 2008-04-16 厦门雅迅网络股份有限公司 Method for searching interest points according to the first letter of phonation in networking vehicle mounted guidance apparatus
CN101566989A (en) * 2008-04-23 2009-10-28 中国电信股份有限公司 Method and system for converting information querying codes into Chinese characters
CN103838876A (en) * 2014-03-27 2014-06-04 烽火通信科技股份有限公司 Method for retrieving document through pinyin and document retrieval system

Similar Documents

Publication Publication Date Title
CN108304378B (en) Text similarity computing method, apparatus, computer equipment and storage medium
US8577882B2 (en) Method and system for searching multilingual documents
CN103984772B (en) Text retrieval captions library generating method and device, video retrieval method and device
US8725717B2 (en) System and method for identifying topics for short text communications
CN104462085B (en) Search key error correction method and device
CN104331446B (en) A kind of massive data processing method mapped based on internal memory
CN102253930B (en) A kind of method of text translation and device
Peters et al. Content extraction using diverse feature sets
CN109726281A (en) A kind of text snippet generation method, intelligent terminal and storage medium
WO2019080402A1 (en) Text information extraction method for structured text, storage medium and server
CN102567409A (en) Method and device for providing retrieval associated word
CN110515896B (en) Model resource management method, model file manufacturing method, device and system
CN111026832A (en) Method and system for generating articles
CN110738059B (en) Text similarity calculation method and system
CN103942274B (en) A kind of labeling system and method for the biologic medical image based on LDA
CN103150409B (en) Method and system for recommending user search word
US11829710B2 (en) Deriving global intent from a composite document to facilitate editing of the composite document
CN102550049A (en) Acquisition of out-of-vocabulary translations by dynamically learning extraction rules
US20220269713A1 (en) Automatic generation of presentation slides from documents
CN106648822A (en) User interface character string screening method and system
CN106776590A (en) A kind of method and system for obtaining entry translation
CN106484660A (en) Title treating method and apparatus
CN107220390A (en) A kind of method and device for creating Chinese index
CN107729518A (en) The text searching method and device of a kind of relevant database
CN107807918A (en) The method and device of Thai words recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170929