CN107220390A - A kind of method and device for creating Chinese index - Google Patents
A kind of method and device for creating Chinese index Download PDFInfo
- Publication number
- CN107220390A CN107220390A CN201710616016.9A CN201710616016A CN107220390A CN 107220390 A CN107220390 A CN 107220390A CN 201710616016 A CN201710616016 A CN 201710616016A CN 107220390 A CN107220390 A CN 107220390A
- Authority
- CN
- China
- Prior art keywords
- chinese
- character string
- letter
- character
- pinyin
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
Abstract
The invention provides a kind of method and device for creating Chinese index, this method includes:Obtain the first letter of pinyin of each Chinese character in Chinese;At least one character string started with first letter of pinyin each described is obtained respectively, wherein, corresponding two Chinese characters of the adjacent first letter of pinyin of any two are adjacent in the Chinese in the character string, and order of the order with correspondence Chinese character in the Chinese of the first letter of pinyin of each in the character string is identical;Each the described character string got is associated with the Chinese, it is used as the index of the Chinese.The device includes:Acquiring unit, cutting unit and associative cell.This programme can improve experience when user retrieves to Chinese.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of method and device for creating Chinese index.
Background technology
With computer technology continuing to develop with it is progressive, various smart machines are widely used in living, produce
Every field.User frequently encounters the situation for needing to be retrieved to Chinese during using smart machine, than
Such as, user in TV programme Chinese using Intelligent set top box program request, it is necessary to by remote control to be watched TV programme
Chinese is scanned for.But, under some special scenes, user can not be examined by inputting Chinese to Chinese
Rope, such as be difficult to input Chinese by remote control, it is therefore desirable to create index for Chinese, Chinese carried out with facilitating
Retrieval.
At present, when creating index for Chinese, the first letter of pinyin of each Chinese character included by Chinese, shape are obtained
Into in including Chinese at least one character string of the first letter of pinyin of first Chinese character as Chinese index, with reality
Now Chinese memory is retrieved by phonetic.For example, Chinese includes 3 Chinese characters, the first letter of pinyin of 3 Chinese characters is successively
For h, l and s, then using h, hl and hls as the Chinese index.
Method for creating index for Chinese at present, because the index of Chinese is included the in Chinese
The first letter of pinyin of one Chinese character, user to Chinese when retrieving, it is necessary to knows first Chinese of Chinese
Word, otherwise can not be successfully retrieved Chinese.Therefore, the existing method that index is created for Chinese, is caused in user couple
Experience when literary fame claims to be retrieved is poor.
The content of the invention
The embodiments of the invention provide a kind of method and device for creating Chinese index, it is possible to increase user is to Chinese
Experience when title is retrieved.
In a first aspect, the embodiments of the invention provide a kind of method for creating Chinese index, including:
Obtain the first letter of pinyin of each Chinese character in Chinese;
At least one character string started with first letter of pinyin each described is obtained respectively, wherein, in the character string
Corresponding two Chinese characters of the adjacent first letter of pinyin of any two are adjacent in the Chinese, and each in the character string
Order of the order of the individual first letter of pinyin with correspondence Chinese character in the Chinese is identical;
Each the described character string got is associated with the Chinese, it is used as the index of the Chinese.
Alternatively, it is described to obtain to spell each described respectively when the Chinese includes at least two Chinese characters
At least one character string of sound initial beginning, including:
According to order of the correspondence Chinese character in the Chinese, sequential combination is carried out to first letter of pinyin each described,
Form character string to be slit;
Cutting at least one times is carried out to the character string to be slit with different dicing positions and cutting number of times, obtained at least
Two character strings, wherein, each described character string includes at least one described first letter of pinyin.
Alternatively, it is described that the character string to be slit is cut at least one times with different dicing positions and cutting number of times
Point, including:
By recursive algorithm, using the character string to be slit as initial input, the character string to be slit is cut
Partite transport is calculated, the character string of at least two operation results that cutting computing each time is exported as respective amount.
Alternatively, after the formation character string to be slit, further comprise:
The character string to be slit is associated with the Chinese, it is used as an index of the Chinese.
Alternatively, the first letter of pinyin for obtaining each Chinese character in Chinese, including:
By pre-defined Java archive file jar bags, each Chinese character that the Chinese includes is turned over respectively
It is translated into corresponding phonetic spelling;
For Chinese character each described, by the jar bags, first letter is extracted from Chinese character correspondence phonetic spelling
It is used as the first letter of pinyin of the Chinese character.
Second aspect, the embodiment of the present invention additionally provides a kind of device for creating Chinese index, including:Obtain single
Member, cutting unit and associative cell;
The acquiring unit, the first letter of pinyin for obtaining the Chinese character of each in Chinese;
The cutting unit, each described first letter of pinyin got for obtaining respectively with the acquiring unit is opened
At least one character string of head, wherein, corresponding two Chinese characters of the adjacent first letter of pinyin of any two in the character string
It is adjacent in the Chinese, and the first letter of pinyin of each in the character string order with corresponding Chinese character in described
Order during literary fame claims is identical;
The associative cell, for each described character string and the Chinese phase for getting the cutting unit
Association, is used as the index of the Chinese.
Alternatively, when the Chinese includes at least two Chinese characters,
The cutting unit includes:Combine subelement and cutting subelement;
The combination subelement, for the order according to correspondence Chinese character in the Chinese, to phonetic each described
Initial carries out sequential combination, forms character string to be slit;
The cutting subelement, for the institute formed with different dicing positions and cutting number of times to the combination subelement
State character string to be slit and carry out cutting at least one times, obtain character string described at least two, wherein, each described character string bag
At least one described first letter of pinyin is included.
Alternatively,
The cutting subelement, for by recursive algorithm, using the character string to be slit as initial input, to described
Character string to be slit carries out cutting computing, and at least two operation results that cutting computing each time is exported are used as respective amount
The character string.
Alternatively,
The associative cell, is further used for the character string to be slit is associated with the Chinese, is used as institute
State an index of Chinese.
Alternatively,
The acquiring unit includes:Translate subelement and extract subelement;
The translation subelement, for the jar bags by pre-defining, each for respectively including the Chinese
Chinese character translation is corresponding phonetic spelling;
The extraction subelement, for for Chinese character each described, by the jar bags, from Chinese character correspondence phonetic
First letter is extracted in spelling as the first letter of pinyin of the Chinese character.
The method and device provided in an embodiment of the present invention for creating Chinese index, obtains each Chinese in Chinese
After the first letter of pinyin of word, for each first letter of pinyin, at least one character string started with the first letter of pinyin is obtained,
So that corresponding two Chinese characters of the adjacent first letter of pinyin of any two are connected in Chinese in each character string, and it is each
Order of the order of each first letter of pinyin with correspondence Chinese character in Chinese is identical in individual character string, by each got
After character string is associated with Chinese, each character string as Chinese index.As can be seen here, will be with every in Chinese
The first letter of pinyin of one Chinese character for beginning character string as the index of Chinese, user do not knowing the of Chinese
During one Chinese character, Chinese can be retrieved by the first letter of pinyin of one or more Chinese characters behind Chinese,
The purpose retrieved to Chinese is reached, so as to improve experience when user retrieves to Chinese.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are the present invention
Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
These accompanying drawings obtain other accompanying drawings.
Fig. 1 is a kind of method flow diagram for establishment Chinese index that one embodiment of the invention is provided;
Fig. 2 is the method flow diagram for another establishment Chinese index that one embodiment of the invention is provided;
Fig. 3 is the signal of equipment where a kind of device for establishment Chinese index that one embodiment of the invention is provided
Figure;
Fig. 4 is a kind of schematic device for establishment Chinese index that one embodiment of the invention is provided;
Fig. 5 is the schematic device for another establishment Chinese index that one embodiment of the invention is provided;
Fig. 6 is the schematic device for another establishment Chinese index that one embodiment of the invention is provided.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
A part of embodiment of the present invention, rather than whole embodiments, based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained on the premise of creative work is not made, belongs to the scope of protection of the invention.
As shown in figure 1, the embodiments of the invention provide a kind of method for creating Chinese index, this method can include
Following steps:
Step 101:Obtain the first letter of pinyin of each Chinese character in Chinese;
Step 102:At least one character string started with first letter of pinyin each described is obtained respectively, wherein, it is described
Corresponding two Chinese characters of the adjacent first letter of pinyin of any two are adjacent in the Chinese in character string, and the word
Order of the order of each first letter of pinyin with correspondence Chinese character in the Chinese is identical in symbol string;
Step 103:Each the described character string got is associated with the Chinese, it is used as the Chinese
Index.
The embodiments of the invention provide a kind of method for creating Chinese index, each Chinese character in Chinese is obtained
First letter of pinyin after, for each first letter of pinyin, obtain at least one character string started with the first letter of pinyin, make
Corresponding two Chinese characters of the adjacent first letter of pinyin of any two in each character string are obtained in Chinese to be connected, and each
Order of the order of the first letter of pinyin of each in character string with correspondence Chinese character in Chinese is identical, by each word got
Symbol string associated with Chinese after, each character string as Chinese index.As can be seen here, will be with each in Chinese
The first letter of pinyin of individual Chinese character for beginning character string as the index of Chinese, user do not knowing the first of Chinese
During Chinese character, Chinese can be retrieved by the first letter of pinyin of one or more Chinese characters behind Chinese, reached
To the purpose retrieved to Chinese, so as to improve experience when user retrieves to Chinese.
For example, Chinese is " the first and second the third ", the first letter of pinyin of Chinese character " first " is j, and the phonetic of Chinese character " second " is first
Letter is y, and the first letter of pinyin of Chinese character " third " is b, obtains respectively by the character string of beginning of j, y and b and is used as Chinese " first
The index of second the third ", such as index include j, yb and b.
Alternatively, as shown in figure 1,
When including at least two Chinese characters in Chinese, obtained respectively with each first letter of pinyin in step 102
The process of at least one character string of beginning, can specifically be realized in the following way:
According to order of the correspondence Chinese character in Chinese, sequential combination is carried out to each first letter of pinyin, one is formed
Character string to be slit;At least one cutting is carried out to character string to be slit with different dicing positions and cutting number of times, each time
Cutting all obtains at least two character strings, and ensures that each character string includes at least one first letter of pinyin.
For example, according to the order of 3 Chinese characters in Chinese " the first and second the third ", by first letter of pinyin j, y and b carry out order group
Close, form character string jyb to be slit.Character string jyb to be slit is repeatedly cut with different dicing positions and cutting number of times
After point, j, y, b, jy, yb can be obtained and amount to 5 character strings.
By the combination of different dicing positions and cutting number of times, to the character string to be slit being made up of each first letter of pinyin
Carry out after multiple cutting, the various character strings formed by one or more first letter of pinyin sequential combinations can be obtained, obtained
Each character string cover the combining form of the first letter of pinyin of any number adjacent Chinese characters in Chinese.Cutting is obtained
Each character string as the index of Chinese after, user can be by the phonetic lead-in of any one Chinese character in Chinese
The first letter of pinyin of female or any number of adjacent Chinese characters combines to retrieve Chinese.On the one hand, user can pass through
The first letter of pinyin of any number and optional position Chinese character combines to retrieve Chinese in Chinese, improves use
The convenience that family is retrieved to Chinese;On the other hand, it is that Chinese creates multiple indexes, user is any one in input
It is individual to index after corresponding pinyin character string, it can rapidly retrieve the corresponding one or more Chinese names of the pinyin character string
Claim, the mode of Relative Fuzzy search can improve the speed retrieved to Chinese.
Alternatively, when carrying out cutting to character string to be slit with different dicing positions and cutting number of times, it can pass through
Recursive algorithm, using character string to be slit as initial input, cutting computing is carried out to character string to be slit, will the fortune of cutting each time
At least two operation results for calculating output are used as the character string for index.
For example, using character string jyb to be slit as initial input, being carried out by recursive algorithm to character string jyb to be slit
Cutting computing.First time cutting computing output string jy and character string yb, using the character string jy and character string yb of output as
Two indexes of Chinese " the first and second the third ";Second of cutting computing respectively using character string jy and character string yb as input, with
Character string jy is used as output string y during input and character as output string j and character string y during input using character string yb
Go here and there b, using the character string j, character string y and character string b of output as Chinese " the first and second the third " 3 indexes.
Cutting is carried out to character string to be slit by recursive algorithm, quickly and accurately character string to be slit can be carried out
Cutting, it is ensured that the character string that cutting computing is formed covers the various sequential combination forms of first letter of pinyin, and then ensures user couple
Convenience and speed that Chinese is retrieved.
Alternatively, except the character string carried out to character string to be slit obtained by cutting is associated with Chinese, make
Outside for the index of Chinese, also character string to be slit is associated with Chinese in itself, one of Chinese is used as
Index.
On the one hand, because user to Chinese when retrieving, each Chinese included by Chinese may be inputted
The first letter of pinyin of word, after character string to be slit is indexed as one of Chinese, it is ensured that user is in input Chinese
During the first letter of pinyin of each included Chinese character, Chinese can be retrieved;On the other hand, using character string to be slit in
The index that literary fame claims, because character string to be slit reflects each Chinese character included by Chinese, with the word to be slit
Symbol string is relatively fewer as the number of the Chinese indexed, thus user is by inputting character string to be slit when being retrieved,
The negligible amounts of the Chinese retrieved so that user can rapidly find out required from each Chinese retrieved
Chinese, further improves experience during user search Chinese.
Alternatively, as shown in figure 1,
When step 101 obtains the first letter of pinyin of each Chinese character in Chinese, can by the jar bags that are pre-created,
Each Chinese character translation that Chinese is included respectively, then for each Chinese character, passes through jar into corresponding phonetic spelling
Wrap and first letter is extracted from the corresponding phonetic spelling of the Chinese character as the first letter of pinyin of the Chinese character.
For example, by the jar bags being pre-created, " first " in Chinese " the first and second the third " is translated as phonetic spelling jia,
" second " is translated as phonetic spelling yi, " third " is translated into phonetic spelling bing.Then by again by jar bags, from phonetic
In spelling jia extract j as " first " first letter of pinyin, from phonetic spelling yi extract y as " second " first letter of pinyin, from
B is extracted in phonetic spelling bing as the first letter of pinyin of " third ".
A jar bag is pre-defined, when needing to create index to Chinese, jar bags can be called to realize in acquisition
The first letter of pinyin of the Chinese character of each during literary fame claims, so as to easily obtain the first letter of pinyin of each Chinese character in Chinese.
Further, since jar bags can be introduced directly into corresponding project, using the method in jar bags, so as to by the jar bags of definition
Different Chinese index creation systems are introduced, the workload of development is reduced.
Exemplified by the Chinese of demand TV program is indexed establishment in Intelligent set top box below, to of the invention real
Apply example offer establishments Chinese index method be described in further detail, as shown in Fig. 2 this method can include it is following
Step:
Step 201:Obtain the Chinese for needing to create index.
In an embodiment of the invention, Intelligent set top box is added after new demand TV program, it is necessary to newly increase
The Chinese of demand TV program creates index, so that user can be by retrieving the point after the letter of remote control importation
Broadcast TV programme.Newly-increased added some points to newly increase before the Chinese of demand TV program creates index, first having to obtain this
Broadcast the Chinese of TV programme.
For example, obtaining the Chinese " the first and second the third " for newly increasing demand TV program A.
Step 202:Each Chinese character translation for being included Chinese by pre-defined jar bags is complete for corresponding phonetic
Spell.
In an embodiment of the invention, the jar bags existed are pre-defined or obtain, will be newly-increased by the jar bags
Plus each Chinese character translation that the Chinese of demand TV program includes is corresponding phonetic spelling.
For example, by pre-defined jar bags, by the Chinese " the first and second the third " for newly increasing demand TV program A
Chinese character " first " is translated as phonetic spelling jia, and " second " is translated as into phonetic spelling yi, " third " is translated into phonetic spelling bing.
Step 203:The first letter of pinyin of each Chinese character is extracted from the phonetic spelling of each Chinese character by jar bags.
In an embodiment of the invention, after the phonetic spelling of the Chinese character of each in getting Chinese, lead to again
Jar bags are crossed, the first letter of pinyin of the Chinese character is extracted from the phonetic spelling of each Chinese character respectively, so as to obtain in Chinese
The first letter of pinyin of each Chinese character.
For example, extracting j from phonetic spelling jia as the first letter of pinyin of " first ", y conducts are extracted from phonetic spelling yi
The first letter of pinyin of " second ", extracts b as the first letter of pinyin of " third " from phonetic spelling bing.
Step 204:The first letter of pinyin of each Chinese character is combined, character string to be slit is formed.
In an embodiment of the invention, after the first letter of pinyin of the Chinese character of each in getting Chinese, according to right
Order of the Chinese character in Chinese is answered, sequential combination is carried out to the first letter of pinyin of each Chinese character, character string to be slit is formed.
For example, the sequence according to Chinese character " first ", " second " and " third " in Chinese " the first and second the third ", by Chinese character " first ",
" second " and " third " corresponding first letter of pinyin j, y and b is combined into character string jyb to be slit.
Step 205:Cutting computing is carried out to character string to be slit by recursive algorithm, at least one character string is obtained.
In an embodiment of the invention, get that the first letter of pinyin of each Chinese character in Chinese constitutes is to be cut
Divide after character string, using character string to be slit as initial input, cutting computing carried out to character string to be slit by recursive algorithm,
Obtain each character string of output.
For example, using character string jyb to be slit as initial input, being carried out by recursive algorithm to character string jyb to be slit
Cutting computing, computing output string jy, character string yb, character string j, character string y and character string b.
Step 206:Each character string and character string to be slit is associated with Chinese, it is used as the rope of Chinese
Draw.
In an embodiment of the invention, after cutting acquisition character string is carried out to character string to be slit, by each of acquisition
Individual character string and character string to be slit are associated with Chinese, regard each character string and character string to be slit as Chinese
The index of title, and the Chinese and the incidence relation of character string and character string to be slit of newly increasing demand TV program are deposited
Store up in Intelligent set top box, when the user is inputted any one in each character string and character string to be slit by remote control,
This can be retrieved and newly increase demand TV program.
For example by character string jy, character string yb, character string j, character string y and character string b with newly increasing demand TV program A
Chinese " the first and second the third " it is associated, and by character string jyb to be slit and the Chinese for newly increasing demand TV program A
" the first and second the third " are associated, and character string jy, character string yb, character string j, character string y, character string b and character string jyb to be slit are made
For the index of Chinese " the first and second the third ".Appoint when user is sent in jy, yb, j, y, b and jyb by remote control to Intelligent set top box
During a character string of anticipating, Intelligent set top box can be indexed by each of Chinese " the first and second the third ", retrieve Chinese " first
Second the third ", and then newly increase demand TV program A to user's displaying.
In an embodiment of the invention, when carrying out cutting computing to character string to be slit by recursive algorithm, Ke Yitong
Following script is crossed to realize:
As shown in Figure 3, Figure 4, the embodiments of the invention provide a kind of device for creating Chinese index.Device embodiment
It can be realized, can also be realized by way of hardware or software and hardware combining by software.For hardware view, such as Fig. 3
It is shown, it is a kind of hardware structure diagram of equipment where the device of establishment Chinese index provided in an embodiment of the present invention, except
Outside processor, internal memory, network interface and nonvolatile memory shown in Fig. 3, the equipment in embodiment where device is led to
Other hardware can also often be included, be such as responsible for the forwarding chip of processing message.Exemplified by implemented in software, as shown in figure 4, making
It is by corresponding computer journey in nonvolatile memory by the CPU of equipment where it for the device on a logical meaning
Sequence instruction reads what operation in internal memory was formed.The device for creating Chinese index that the present embodiment is provided, including:Obtain single
Member 401, cutting unit 402 and associative cell 403;
Acquiring unit 401, the first letter of pinyin for obtaining the Chinese character of each in Chinese;
Cutting unit 402, starts extremely for obtaining each first letter of pinyin got with acquiring unit 401 respectively
A few character string, wherein, the corresponding two Chinese characters phase in Chinese of the adjacent first letter of pinyin of any two in character string
Order of the order of the first letter of pinyin of each in neighbour, and character string with correspondence Chinese character in Chinese is identical;
Associative cell 403, each character string for cutting unit 402 to be got is associated with Chinese, as
The index of Chinese.
Alternatively, as shown in figure 5,
When Chinese includes at least two Chinese characters, cutting unit 402 includes:Combine subelement 4021 and cutting
Unit 4022;
Subelement 4021 is combined, for the order according to correspondence Chinese character in Chinese, each first letter of pinyin is entered
Row sequential combination, forms character string to be slit;
Cutting subelement 4022, for being treated with different dicing positions and cutting number of times to the combination formation of subelement 4021
Cutting character string carries out cutting at least one times, obtains at least two character strings, wherein, each character string includes at least one
First letter of pinyin.
Alternatively, as shown in figure 5,
Cutting subelement 4022, for by recursive algorithm, using character string to be slit as initial input, to word to be slit
Symbol string carries out cutting computing, character string of at least two operation results that cutting computing each time is exported as respective amount.
Alternatively, as shown in figure 5,
Associative cell 403, is further used for character string to be slit is associated with Chinese, is used as the one of Chinese
Individual index.
Alternatively, as shown in fig. 6,
Acquiring unit 401 includes:Translate subelement 4011 and extract subelement 4012;
Subelement 4011 is translated, for the jar bags by pre-defining, each Chinese character for respectively including Chinese
It is translated as corresponding phonetic spelling;
Subelement 4012 is extracted, for for each Chinese character, by jar bags, being carried from Chinese character correspondence phonetic spelling
First letter is taken as the first letter of pinyin of the Chinese character.
It should be noted that the content such as information exchange, implementation procedure between each unit in said apparatus, due to this
Inventive method embodiment is based on same design, and particular content can be found in the narration in the inventive method embodiment, no longer goes to live in the household of one's in-laws on getting married herein
State.
The embodiment of the present invention additionally provides the execute instruction that is stored with a kind of computer-readable recording medium, computer-readable recording medium, when storage control
Described in the computing device of device during execute instruction, the storage control is performed in the establishment that any one foregoing embodiment is provided
The method of literary name index.
The embodiment of the present invention additionally provides a kind of storage control, including:Processor, memory and bus;
The memory is used to store execute instruction, and the processor is connected with the memory by the bus, when
During the storage control operation, any one foregoing embodiment of the execution of memory storage described in the computing device
The method for creating Chinese index of offer.
In summary, the method and device for creating Chinese index that each embodiment of the invention is provided, at least has
Following beneficial effect:
1st, in embodiments of the present invention, obtain in Chinese after the first letter of pinyin of each Chinese character, for each
First letter of pinyin, obtains at least one character string started with the first letter of pinyin so that any two in each character string
Corresponding two Chinese characters of adjacent first letter of pinyin are connected in Chinese, and each first letter of pinyin in each character string
Order of the order with correspondence Chinese character in Chinese is identical, after each character string got is associated with Chinese, respectively
Individual character string as Chinese index.As can be seen here, by the first letter of pinyin of each Chinese character using in Chinese to open
The character string of head is as the index of Chinese, and user can pass through Chinese name when not knowing the first Chinese character of Chinese
Claim the first letter of pinyin of one or more Chinese characters below to retrieve Chinese, reach what Chinese was retrieved
Purpose, so as to improve experience when user retrieves to Chinese.
2nd, in embodiments of the present invention, in the character string that acquisition is indexed as Chinese, first by Chinese
The first letter of pinyin of each Chinese character carries out sequential combination and forms character string to be slit, then with different dicing positions and cutting time
It is several that multiple cutting is carried out to character string to be slit, multiple character strings are obtained, each character string obtained covers Chinese
The combining form of the first letter of pinyin of middle any number adjacent Chinese characters.So, user can pass through any number in Chinese
Combine to retrieve Chinese with the first letter of pinyin of optional position Chinese character, improve user and Chinese is examined
The convenience of rope.
3rd, in embodiments of the present invention, to character string to be slit carry out cutting when, can using character string to be slit as
Initial input, cutting computing is carried out by recursive algorithm to character string to be slit.So, can be quickly and accurately to be slit
Character string carries out cutting, it is ensured that the character string that cutting computing is formed covers the various sequential combination forms of first letter of pinyin, and then
Ensure convenience and speed that user is retrieved to Chinese.
4th, in embodiments of the present invention, except by the character string and Chinese obtained by character string to be slit progress cutting
Be associated, as the index of Chinese outside, also character string to be slit is associated with Chinese in itself, Chinese is used as
One index of title.So, for the first letter of pinyin of each Chinese character included by Chinese can be inputted, to Chinese name
Claim to carry out more quickly retrieval, improve the speed retrieved to Chinese.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity
Or operation makes a distinction with another entity or operation, and not necessarily require or imply exist between these entities or operation
Any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non-
It is exclusive to include, so that process, method, article or equipment including a series of key elements not only include those key elements,
But also other key elements including being not expressly set out, or also include solid by this process, method, article or equipment
Some key elements.In the absence of more restrictions, by sentence " including the key element that a 〃 〃 " is limited, it is not excluded that
Also there is other identical factor in the process including the key element, method, article or equipment.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through
Programmed instruction related hardware is completed, and foregoing program can be stored in the storage medium of embodied on computer readable, the program
Upon execution, the step of including above method embodiment is performed;And foregoing storage medium includes:ROM, RAM, magnetic disc or light
Disk etc. is various can be with the medium of store program codes.
It is last it should be noted that:Presently preferred embodiments of the present invention is the foregoing is only, the skill of the present invention is merely to illustrate
Art scheme, is not intended to limit the scope of the present invention.Any modification for being made within the spirit and principles of the invention,
Equivalent substitution, improvement etc., are all contained in protection scope of the present invention.
Claims (10)
1. a kind of method for creating Chinese index, it is characterised in that including:
Obtain the first letter of pinyin of each Chinese character in Chinese;
At least one character string started with first letter of pinyin each described is obtained respectively, wherein, it is any in the character string
Corresponding two Chinese characters of two adjacent first letter of pinyin are adjacent in the Chinese, and the institute of each in the character string
State order of the order of first letter of pinyin with correspondence Chinese character in the Chinese identical;
Each the described character string got is associated with the Chinese, it is used as the index of the Chinese.
2. according to the method described in claim 1, it is characterised in that when the Chinese includes at least two Chinese characters,
It is described to obtain respectively with least one character string of first letter of pinyin beginning each described, including:
According to order of the correspondence Chinese character in the Chinese, sequential combination is carried out to first letter of pinyin each described, formed
Character string to be slit;
Cutting at least one times is carried out to the character string to be slit with different dicing positions and cutting number of times, at least two are obtained
The character string, wherein, each described character string includes at least one described first letter of pinyin.
3. method according to claim 2, it is characterised in that it is described with different dicing positions and cutting number of times to described
Character string to be slit carries out cutting at least one times, including:
By recursive algorithm, using the character string to be slit as initial input, cutting fortune is carried out to the character string to be slit
Calculate, the character string of at least two operation results that cutting computing each time is exported as respective amount.
4. method according to claim 2, it is characterised in that after the formation character string to be slit, further wrap
Include:
The character string to be slit is associated with the Chinese, it is used as an index of the Chinese.
5. according to any described method in Claims 1-4, it is characterised in that each Chinese character in the acquisition Chinese
First letter of pinyin, including:
By pre-defined Java archive file jar bags, it is by each Chinese character translation that the Chinese includes respectively
Corresponding phonetic spelling;
For Chinese character each described, by the jar bags, first letter conduct is extracted from Chinese character correspondence phonetic spelling
The first letter of pinyin of the Chinese character.
6. a kind of device for creating Chinese index, it is characterised in that including:Acquiring unit, cutting unit and associative cell;
The acquiring unit, the first letter of pinyin for obtaining the Chinese character of each in Chinese;
The cutting unit, for obtaining each the first letter of pinyin beginning got with the acquiring unit respectively
At least one character string, wherein, corresponding two Chinese characters of the adjacent first letter of pinyin of any two are in institute in the character string
State it is adjacent in Chinese, and the first letter of pinyin of each in the character string order with corresponding Chinese character in the Chinese name
Order in title is identical;
The associative cell, each described character string for the cutting unit to be got is related to the Chinese
Connection, is used as the index of the Chinese.
7. device according to claim 6, it is characterised in that when the Chinese includes at least two Chinese characters,
The cutting unit includes:Combine subelement and cutting subelement;
The combination subelement, for the order according to correspondence Chinese character in the Chinese, to phonetic lead-in each described
Mother carries out sequential combination, forms character string to be slit;
The cutting subelement, for being treated with different dicing positions and cutting number of times to described combine described in subelement formation
Cutting character string carries out cutting at least one times, obtains character string described at least two, wherein, each described character string includes
At least one described first letter of pinyin.
8. device according to claim 7, it is characterised in that
The cutting subelement, for by recursive algorithm, using the character string to be slit as initial input, to described to be cut
Character string is divided to carry out cutting computing, at least two operation results that cutting computing each time is exported are as described in respective amount
Character string.
9. device according to claim 7, it is characterised in that
The associative cell, be further used for the character string to be slit is associated with the Chinese, in described
The index that literary fame claims.
10. according to any described device in claim 6 to 9, it is characterised in that
The acquiring unit includes:Translate subelement and extract subelement;
The translation subelement, for the jar bags by pre-defining, each Chinese character for respectively including the Chinese
It is translated as corresponding phonetic spelling;
The extraction subelement, for for Chinese character each described, by the jar bags, from Chinese character correspondence phonetic spelling
The middle first letter that extracts is as the first letter of pinyin of the Chinese character.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710616016.9A CN107220390A (en) | 2017-07-26 | 2017-07-26 | A kind of method and device for creating Chinese index |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710616016.9A CN107220390A (en) | 2017-07-26 | 2017-07-26 | A kind of method and device for creating Chinese index |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107220390A true CN107220390A (en) | 2017-09-29 |
Family
ID=59954227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710616016.9A Pending CN107220390A (en) | 2017-07-26 | 2017-07-26 | A kind of method and device for creating Chinese index |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107220390A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101162146A (en) * | 2007-02-01 | 2008-04-16 | 厦门雅迅网络股份有限公司 | Method for searching interest points according to the first letter of phonation in networking vehicle mounted guidance apparatus |
CN101566989A (en) * | 2008-04-23 | 2009-10-28 | 中国电信股份有限公司 | Method and system for converting information querying codes into Chinese characters |
CN103838876A (en) * | 2014-03-27 | 2014-06-04 | 烽火通信科技股份有限公司 | Method for retrieving document through pinyin and document retrieval system |
-
2017
- 2017-07-26 CN CN201710616016.9A patent/CN107220390A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101162146A (en) * | 2007-02-01 | 2008-04-16 | 厦门雅迅网络股份有限公司 | Method for searching interest points according to the first letter of phonation in networking vehicle mounted guidance apparatus |
CN101566989A (en) * | 2008-04-23 | 2009-10-28 | 中国电信股份有限公司 | Method and system for converting information querying codes into Chinese characters |
CN103838876A (en) * | 2014-03-27 | 2014-06-04 | 烽火通信科技股份有限公司 | Method for retrieving document through pinyin and document retrieval system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304378B (en) | Text similarity computing method, apparatus, computer equipment and storage medium | |
US8577882B2 (en) | Method and system for searching multilingual documents | |
CN103984772B (en) | Text retrieval captions library generating method and device, video retrieval method and device | |
US8725717B2 (en) | System and method for identifying topics for short text communications | |
CN104462085B (en) | Search key error correction method and device | |
CN104331446B (en) | A kind of massive data processing method mapped based on internal memory | |
CN102253930B (en) | A kind of method of text translation and device | |
Peters et al. | Content extraction using diverse feature sets | |
CN109726281A (en) | A kind of text snippet generation method, intelligent terminal and storage medium | |
WO2019080402A1 (en) | Text information extraction method for structured text, storage medium and server | |
CN102567409A (en) | Method and device for providing retrieval associated word | |
CN110515896B (en) | Model resource management method, model file manufacturing method, device and system | |
CN111026832A (en) | Method and system for generating articles | |
CN110738059B (en) | Text similarity calculation method and system | |
CN103942274B (en) | A kind of labeling system and method for the biologic medical image based on LDA | |
CN103150409B (en) | Method and system for recommending user search word | |
US11829710B2 (en) | Deriving global intent from a composite document to facilitate editing of the composite document | |
CN102550049A (en) | Acquisition of out-of-vocabulary translations by dynamically learning extraction rules | |
US20220269713A1 (en) | Automatic generation of presentation slides from documents | |
CN106648822A (en) | User interface character string screening method and system | |
CN106776590A (en) | A kind of method and system for obtaining entry translation | |
CN106484660A (en) | Title treating method and apparatus | |
CN107220390A (en) | A kind of method and device for creating Chinese index | |
CN107729518A (en) | The text searching method and device of a kind of relevant database | |
CN107807918A (en) | The method and device of Thai words recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170929 |