CN108549627A - Chinese character processing method and device - Google Patents
Chinese character processing method and device Download PDFInfo
- Publication number
- CN108549627A CN108549627A CN201810191423.4A CN201810191423A CN108549627A CN 108549627 A CN108549627 A CN 108549627A CN 201810191423 A CN201810191423 A CN 201810191423A CN 108549627 A CN108549627 A CN 108549627A
- Authority
- CN
- China
- Prior art keywords
- chinese character
- chinese
- pending
- phonetic
- phonetic alphabet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
Abstract
An embodiment of the present invention provides a kind of Chinese character processing method and devices.Obtain at least one element that pending Chinese character includes, which includes the stroke of simple or compound vowel of a Chinese syllable and the pending Chinese character of composition that the tone of the Chinese phonetic alphabet of pending Chinese character, the Chinese phonetic alphabet initial consonant, Chinese phonetic alphabet for including include;Determine index position of each element in predicted elemental total collection;Count occurrence number of each element in pending Chinese character;The phonetic Hash vector of pending Chinese character is generated according to the index position and the occurrence number;Utilize default embedded Processing with Neural Network phonetic Hash vector, you can to obtain the continuous feature of pending Chinese character.The present invention has good robustness for not appearing in the Chinese character in pre-set dictionary, in addition, due to phonetic hash space constant magnitude, so even increasing Chinese character in pre-set dictionary newly, it will not influence the overall construction of constructed phonetic hash space, the corresponding element of newly-increased Chinese character only need to be added, scalability is strong.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of Chinese character processing method and device.
Background technology
Currently, deep learning is widely applied in related fields such as natural language processing, text translations.In the processing Chinese
When word, need for discrete data as Chinese character to be converted to the continuous feature that can be input to depth network in most cases.
The method being commonly used is One-hot Embedding, this kind of method is that the position by Chinese character in pre-set dictionary carries out
Coding still has following two disadvantage although training deep neural network end-to-endly may be implemented in this method:
First, in internet environment, the Chinese character that general pre-set dictionary includes is very more, for characterizing Chinese character default
The embeded matrix of position is especially huge in dictionary, if increasing Chinese character in pre-set dictionary newly, needs to re-create embeded matrix, to
Lead to poor expandability.
Secondly it, when Chinese character to be treated does not appear in pre-set dictionary, will be unable to find this by the above method
It position of the Chinese character in pre-set dictionary finally also should with regard to None- identified due to can not find position of the Chinese character in pre-set dictionary
Chinese character.
Invention content
In order to solve the above technical problems, a kind of Chinese character processing method of offer of the embodiment of the present invention and device.
In a first aspect, the embodiment of the present invention shows a kind of Chinese character processing method, the method includes:
At least one element that pending Chinese character includes is obtained, the element includes the sound of the Chinese phonetic alphabet of pending Chinese character
It adjusts, the stroke of pending Chinese character described in the simple or compound vowel of a Chinese syllable and composition that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
Determine each index position of the element in predicted elemental total collection;
Count occurrence number of each described element in the pending Chinese character;
The phonetic Hash vector of the pending Chinese character is generated according to the index position and the occurrence number;
Using phonetic Hash vector described in default embedded Processing with Neural Network, the continuous spy of the pending Chinese character is obtained
Sign.
It is described to wait locating according to described in the index position and occurrence number generation in an optional realization method
The phonetic Hash vector of Chinese character is managed, including:
Generate the full null vector with dimensions such as the predicted elemental total collections;
For index position of each element in the predicted elemental total collection, determine the index position described
The corresponding numerical value of the dimension is updated to the element in the pending Chinese character and goes out occurrence by the dimension in full null vector
Number, obtains the phonetic Hash vector of the pending Chinese character.
In an optional realization method, the method further includes:
The all elements of each Chinese character in pre-set dictionary are obtained, the element includes the sound of the Chinese phonetic alphabet of Chinese character
It adjusts, the stroke for the simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
The all elements of each Chinese character are sought into union, obtain the predicted elemental total collection, wherein the predicted elemental
Each element in total collection has fixed index position.
In an optional realization method, at least one element for obtaining pending Chinese character, including:
Correspondence between the element for including according to pre-set Chinese character and Chinese character determines and the pending Chinese character
Corresponding element, and the element for including as pending Chinese character.
Second aspect, the embodiment of the present invention show that a kind of Chinese character processing device, described device include:
First acquisition module, at least one element for including for obtaining pending Chinese character, the element includes pending
The simple or compound vowel of a Chinese syllable and composition institute that initial consonant that the tone of the Chinese phonetic alphabet of Chinese character, the Chinese phonetic alphabet include, the Chinese phonetic alphabet include
State the stroke of pending Chinese character;
Determining module, for determining each index position of the element in predicted elemental total collection;
Statistical module, for counting occurrence number of each described element in the pending Chinese character;
Generation module, the phonetic for generating the pending Chinese character according to the index position and the occurrence number are breathed out
Uncommon vector;
Processing module, for using phonetic Hash vector described in default embedded Processing with Neural Network, obtaining described pending
The continuous feature of Chinese character.
In an optional realization method, the generation module includes:
Generation unit, for generating and the full null vector of the dimensions such as the predicted elemental total collection;
Determination unit, for index position for each element in the predicted elemental total collection, determine described in
Dimension of the index position in the full null vector, updating unit, for the corresponding numerical value of the dimension to be updated to the member
Occurrence number of the element in the pending Chinese character, obtains the phonetic Hash vector of the pending Chinese character.
In an optional realization method, described device further includes:
Second acquisition module, all elements for obtaining each Chinese character in pre-set dictionary, the element include the Chinese
The stroke of simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet of word include include;
Union module is sought, for all elements of each Chinese character to be sought union, obtains the predicted elemental total collection,
In, each element in the predicted elemental total collection has fixed index position.
In an optional realization method, first acquisition module is specifically used for:According to pre-set Chinese character with
Correspondence between the element that Chinese character includes determines element corresponding with the pending Chinese character, and as the pending Chinese
The element that word includes.
The third aspect, the embodiment of the present invention show a kind of terminal, including:Memory, processor and it is stored in described deposit
On reservoir and the Chines words processing program that can run on the processor, when the Chines words processing program is executed by the processor
The step of realizing Chinese character processing method as described in relation to the first aspect.
Fourth aspect, the embodiment of the present invention show a kind of computer readable storage medium, the computer-readable storage
It is stored with Chines words processing program on medium, the Chinese as described in relation to the first aspect is realized when the Chines words processing program is executed by processor
The step of word processing method.
Compared with prior art, the present invention has the following advantages:
In embodiments of the present invention, at least one element that pending Chinese character includes is obtained first, which includes waiting locating
The simple or compound vowel of a Chinese syllable and composition that initial consonant, Chinese phonetic alphabet that tone, the Chinese phonetic alphabet for managing the Chinese phonetic alphabet of Chinese character include include wait locating
Manage the stroke of Chinese character;Then index position of each element in predicted elemental total collection is determined;Each element is counted again
Occurrence number in pending Chinese character;The phonetic that pending Chinese character is generated according to the index position and the occurrence number later is breathed out
Uncommon vector;And utilize default embedded Processing with Neural Network phonetic Hash vector, you can to obtain the continuous spy of pending Chinese character
Sign.Due to using the Chinese character in phonetic hash space characterization pre-set dictionary in the embodiment of the present invention, for not appearing in predetermined word
Chinese character in allusion quotation has good robustness, further, since phonetic hash space constant magnitude, so even new in pre-set dictionary
Increase Chinese character, the overall construction of constructed phonetic hash space will not be influenced, only need to add the corresponding element of newly-increased Chinese character i.e.
Can, scalability is strong.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technical means of the present invention,
And can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, below the special specific implementation mode for lifting the present invention.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, various advantages and benefit are for ordinary skill people
Member will become clear.Attached drawing is only used for showing preferred embodiment, and is not considered as limitation of the present invention.And
In entire attached drawing, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is a kind of step flow chart of Chinese character processing method embodiment shown in the present invention;
Fig. 2 is a kind of structure diagram of Chinese character processing device embodiment shown in the present invention;
Fig. 3 is a kind of structure diagram of terminal embodiment shown in the present invention.
Specific implementation mode
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Completely it is communicated to those skilled in the art.
Referring to Fig.1, show that a kind of step flow chart of Chinese character processing method embodiment of the present invention, this method specifically may be used
To include the following steps:
In step S101, at least one element that pending Chinese character includes is obtained, which includes pending Chinese character
The simple or compound vowel of a Chinese syllable and the pending Chinese character of composition that initial consonant that the tone of the Chinese phonetic alphabet, the Chinese phonetic alphabet include, the Chinese phonetic alphabet include
Stroke;
In embodiments of the present invention, each Chinese character is corresponding with the Chinese phonetic alphabet, and the Chinese phonetic alphabet is by multiple phonetic alphabet
Composition, phonetic alphabet include initial consonant and simple or compound vowel of a Chinese syllable, and each Chinese character is by a Chinese-character stroke, or by multiple and different or identical
Chinese-character stroke is composed.
Pending Chinese character can be a Chinese character, or the word being made of multiple Chinese characters, in the embodiment of the present invention
In, it is illustrated so that pending Chinese character is a Chinese character as an example, but not as limiting the scope of the invention.
In one embodiment, the Chinese phonetic alphabet of pending Chinese character and the tone of the Chinese phonetic alphabet can be obtained first,
Then initial consonant is searched in the Chinese phonetic alphabet of pending Chinese character according to default initial consonant table, and, it is waiting locating according to default rhythm matrix
It manages in the Chinese phonetic alphabet of Chinese character and searches simple or compound vowel of a Chinese syllable, then obtain the Chinese-character stroke for forming pending Chinese character.
It wherein, can be according to the Chinese of pre-set Chinese character and Chinese character when obtaining the Chinese phonetic alphabet of pending Chinese character
Correspondence between phonetic obtains.It, can be according to pre-set when obtaining the tone of the Chinese phonetic alphabet of pending Chinese character
Correspondence between the tone of the Chinese phonetic alphabet of Chinese character and Chinese character obtains.And obtaining the Chinese character for forming pending Chinese character
It can be obtained according to the correspondence between pre-set Chinese character and the stroke of Chinese character when stroke.
In an alternative embodiment of the invention, for any one Chinese character in pre-set dictionary, the Chinese can also be obtained in advance
The all elements that word includes, and all elements for including by the Chinese character and the Chinese character form corresponding table item, and be stored in and set in advance
In correspondence between the element that the Chinese character and Chinese character set include, for other each Chinese characters in pre-set dictionary, equally
So.
It therefore, in this step, can be directly according to corresponding between pre-set Chinese character and the element that Chinese character includes
Relationship determines corresponding with pending Chinese character element, and the element for including as pending Chinese character, in this way, can be with
The all elements that pending Chinese character includes once are can be obtained by so that searching, and then the pending Chinese character of acquisition can be improved and include
Element efficiency.
In step s 102, index position of each element in predicted elemental total collection is determined;
In embodiments of the present invention, predicted elemental total collection is the element for each Chinese character for including according to pre-set dictionary
What generation obtained arrives, and predicted elemental total collection includes multiple elements, for example, predicted elemental total collection includes for forming Chinese character
All strokes, including all initial consonants, all simple or compound vowel of a Chinese syllable and all tones etc. in the Chinese phonetic alphabet, each stroke are a member
Element, each initial consonant are an element, each simple or compound vowel of a Chinese syllable is an element, each tone is an element, each
A element corresponding index position in predicted elemental total collection.
In embodiments of the present invention, predicted elemental total collection can be generated as follows in advance:
For example, obtaining all elements for each Chinese character that pre-set dictionary includes, element includes the Chinese phonetic alphabet of Chinese character
Tone, the Chinese phonetic alphabet initial consonant, the Chinese phonetic alphabet that the include simple or compound vowel of a Chinese syllable and composition Chinese character that include stroke;By each Chinese character
All elements seek union, obtain predicted elemental total collection, wherein each element in predicted elemental total collection has fixation
Index position, and index position of the different elements in predicted elemental total collection is different.
For example, a certain element of pending Chinese character is " zh ", then index of the element in predicted elemental total collection is inquired
Position, the i.e. element are particularly located at which row and which row in predicted elemental total collection.
In step s 103, occurrence number of each element in pending Chinese character is counted;
Alternatively, in an alternative embodiment of the invention, each element going out in predicted elemental total collection can also be counted
Occurrence number.
In step S104, the phonetic Hash vector of pending Chinese character is generated according to the index position and the occurrence number;
Phonetic Hash vector includes multiple dimensions, and each dimension corresponds to an index position, and each index position corresponds to one
A element.Determining occurrence number and its index position in predicted elemental total collection of certain element in pending Chinese character
Afterwards, it may be determined that the corresponding dimension of the index position, and set the corresponding numerical value of the dimension to the occurrence number, for occurring
Number is the corresponding a kind of dimension of the index position of 0 element, and the corresponding numerical value of such dimension is set as 0, obtains phonetic Hash
Vector.
In an optional realization method, the full null vector with dimensions such as predicted elemental total collections can be generated, then
For index position of any one element in predicted elemental total collection, dimension of the index position in full null vector is determined
Degree, is updated to occurrence number of the element in pending Chinese character, for other each elements by the corresponding numerical value of the dimension
Index position in predicted elemental total collection, equally executes aforesaid operations, to obtain the phonetic Hash of pending Chinese character to
Amount.
In step S105, using default embedded Processing with Neural Network phonetic Hash vector, the company of pending Chinese character is obtained
Continuous feature.
Using default embedded Processing with Neural Network phonetic Hash vector, the specific place of the continuous feature of pending Chinese character is obtained
Reason mode is not particularly limited this in the embodiment of the present invention with reference to existing the relevant technologies.Obtaining pending Chinese character
After continuous feature, analysis classification can be carried out to the semanteme of pending Chinese character according to the continuous feature.
In embodiments of the present invention, at least one element that pending Chinese character includes is obtained first, which includes waiting locating
The simple or compound vowel of a Chinese syllable and composition that initial consonant, Chinese phonetic alphabet that tone, the Chinese phonetic alphabet for managing the Chinese phonetic alphabet of Chinese character include include wait locating
Manage the stroke of Chinese character;Then index position of each element in predicted elemental total collection is determined;Each element is counted again
Occurrence number in pending Chinese character;The phonetic that pending Chinese character is generated according to the index position and the occurrence number later is breathed out
Uncommon vector;And utilize default embedded Processing with Neural Network phonetic Hash vector, you can to obtain the continuous spy of pending Chinese character
Sign.Due to using the Chinese character in phonetic hash space characterization pre-set dictionary in the embodiment of the present invention, for not appearing in predetermined word
Chinese character in allusion quotation has good robustness, further, since phonetic hash space constant magnitude, so even new in pre-set dictionary
Increase Chinese character, the overall construction of constructed phonetic hash space will not be influenced, only need to add the corresponding element of newly-increased Chinese character i.e.
Can, scalability is strong.
It should be noted that for embodiment of the method, for simple description, therefore it is all expressed as a series of action group
It closes, but those skilled in the art should understand that, the embodiment of the present invention is not limited by the described action sequence, because according to
According to the embodiment of the present invention, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also should
Know, embodiment described in this description belongs to preferred embodiment, and the involved action not necessarily present invention is implemented
Necessary to example.
With reference to Fig. 2, show that a kind of structure diagram of Chinese character processing device embodiment of the present invention, the device can specifically wrap
Include following module:
First acquisition module 11, at least one element for including for obtaining pending Chinese character, the element include waiting locating
The simple or compound vowel of a Chinese syllable and composition that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet for managing the Chinese phonetic alphabet of Chinese character include include
The stroke of the pending Chinese character;
Determining module 12, for determining each index position of the element in predicted elemental total collection;
Statistical module 13, for counting occurrence number of each described element in the pending Chinese character;
Generation module 14, the phonetic for generating the pending Chinese character according to the index position and the occurrence number
Hash vector;
Processing module 15 described waits locating for using phonetic Hash vector described in default embedded Processing with Neural Network, obtaining
Manage the continuous feature of Chinese character.
In an optional realization method, the generation module 14 includes:
Generation unit, for generating and the full null vector of the dimensions such as the predicted elemental total collection;
Determination unit, for index position for each element in the predicted elemental total collection, determine described in
Dimension of the index position in the full null vector, updating unit, for the corresponding numerical value of the dimension to be updated to the member
Occurrence number of the element in the pending Chinese character, obtains the phonetic Hash vector of the pending Chinese character.
In an optional realization method, described device further includes:
Second acquisition module, all elements for obtaining each Chinese character in pre-set dictionary, the element include the Chinese
The stroke of simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet of word include include;
Union module is sought, for all elements of each Chinese character to be sought union, obtains the predicted elemental total collection,
In, each element in the predicted elemental total collection has fixed index position.
In an optional realization method, first acquisition module 11 is specifically used for:According to pre-set Chinese character
Correspondence between the element for including with Chinese character determines element corresponding with the pending Chinese character, and as pending
The element that Chinese character includes.
In embodiments of the present invention, at least one element that pending Chinese character includes is obtained first, which includes waiting locating
The simple or compound vowel of a Chinese syllable and composition that initial consonant, Chinese phonetic alphabet that tone, the Chinese phonetic alphabet for managing the Chinese phonetic alphabet of Chinese character include include wait locating
Manage the stroke of Chinese character;Then index position of each element in predicted elemental total collection is determined;Each element is counted again
Occurrence number in pending Chinese character;The phonetic that pending Chinese character is generated according to the index position and the occurrence number later is breathed out
Uncommon vector;And utilize default embedded Processing with Neural Network phonetic Hash vector, you can to obtain the continuous spy of pending Chinese character
Sign.Due to using the Chinese character in phonetic hash space characterization pre-set dictionary in the embodiment of the present invention, for not appearing in predetermined word
Chinese character in allusion quotation has good robustness, further, since phonetic hash space constant magnitude, so even new in pre-set dictionary
Increase Chinese character, the overall construction of constructed phonetic hash space will not be influenced, only need to add the corresponding element of newly-increased Chinese character i.e.
Can, scalability is strong.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description
Place illustrates referring to the part of embodiment of the method.
The present invention also shows a kind of Chines words processing terminal, which may include:It memory, processor and is stored in
On reservoir and the Chines words processing program that can run on a processor, realized in the present invention when Chines words processing program is executed by processor
The step of any one described Chinese character processing method.
Fig. 3 is a kind of block diagram of Chines words processing terminal 600 shown according to an exemplary embodiment.For example, terminal 600 can
To be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices are good for
Body equipment, personal digital assistant etc..
With reference to Fig. 3, terminal 600 may include following one or more components:Processing component 602, memory 604, power supply
Component 606, multimedia component 608, audio component 610, the interface 612 of input/output (I/O), sensor module 614, and
Communication component 616.
The integrated operation of 602 usual control device 600 of processing component, such as with display, call, data communication, phase
Machine operates and record operates associated operation.Processing component 602 may include that one or more processors 620 refer to execute
It enables, to complete all or part of step of above-mentioned Chinese character processing method method.In addition, processing component 602 may include one or
Multiple modules, convenient for the interaction between processing component 602 and other assemblies.For example, processing component 602 may include multimedia mould
Block, to facilitate the interaction between multimedia component 608 and processing component 602.
Memory 604 is configured as storing various types of data to support the operation in terminal 600.These data are shown
Example includes instruction for any application program or method that are operated in terminal 600, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 604 can be by any kind of volatibility or non-volatile memory device or their group
It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 606 provides electric power for the various assemblies of terminal 600.Power supply module 606 may include power management system
System, one or more power supplys and other generated with for terminal 600, management and the associated component of distribution electric power.
Multimedia component 608 is included in the screen of one output interface of offer between the terminal 600 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action
Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers
Body component 608 includes a front camera and/or rear camera.When terminal 600 is in operation mode, such as screening-mode or
When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a Mike
Wind (MIC), when terminal 600 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by with
It is set to reception external audio signal.The received audio signal can be further stored in memory 604 or via communication set
Part 616 is sent.In some embodiments, audio component 610 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 612 provide interface between processing component 602 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor module 614 includes one or more sensors, and the state for providing various aspects for terminal 600 is commented
Estimate.For example, sensor module 614 can detect the state that opens/closes of terminal 600, and the relative positioning of component, for example, it is described
Component is the display and keypad of terminal 600, and sensor module 614 can be with 600 1 components of detection terminal 600 or terminal
Position change, the existence or non-existence that user contacts with terminal 600,600 orientation of device or acceleration/deceleration and terminal 600
Temperature change.Sensor module 614 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 614 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 616 is configured to facilitate the communication of wired or wireless way between terminal 600 and other equipment.Terminal
600 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation
In example, communication component 616 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 616 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal 600 can be believed by one or more application application-specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing Chinese character processing method, specifically
Ground, Chinese character processing method include:
At least one element that pending Chinese character includes is obtained, the element includes the sound of the Chinese phonetic alphabet of pending Chinese character
It adjusts, the stroke of pending Chinese character described in the simple or compound vowel of a Chinese syllable and composition that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
Determine each index position of the element in predicted elemental total collection;
Count occurrence number of each described element in the pending Chinese character;
The phonetic Hash vector of the pending Chinese character is generated according to the index position and the occurrence number;
Using phonetic Hash vector described in default embedded Processing with Neural Network, the continuous spy of the pending Chinese character is obtained
Sign.
It is described to wait locating according to described in the index position and occurrence number generation in an optional realization method
The phonetic Hash vector of Chinese character is managed, including:
Generate the full null vector with dimensions such as the predicted elemental total collections;
For index position of each element in the predicted elemental total collection, determine the index position described
The corresponding numerical value of the dimension is updated to the element in the pending Chinese character and goes out occurrence by the dimension in full null vector
Number, obtains the phonetic Hash vector of the pending Chinese character.
In an optional realization method, the method further includes:
The all elements of each Chinese character in pre-set dictionary are obtained, the element includes the sound of the Chinese phonetic alphabet of Chinese character
It adjusts, the stroke for the simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
The all elements of each Chinese character are sought into union, obtain the predicted elemental total collection, wherein the predicted elemental
Each element in total collection has fixed index position.
In an optional realization method, at least one element for obtaining pending Chinese character, including:According to advance
Correspondence between the element that the Chinese character and Chinese character of setting include determines element corresponding with the pending Chinese character, and
The element for including as pending Chinese character.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of
Such as include the memory 604 of instruction, above-metioned instruction can be executed by the processor 620 of terminal 600 to complete above-mentioned Chines words processing side
Method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, magnetic
Band, floppy disk and optical data storage devices etc..When the instruction in storage medium is executed by the processor of terminal so that terminal can
The step of executing any one heretofore described Chinese character processing method.
Provided herein Chines words processing scheme not with the intrinsic phase of any certain computer, virtual system or miscellaneous equipment
It closes.Various general-purpose systems can also be used together with teaching based on this.As described above, construction has present invention side
Structure required by the system of case is obvious.In addition, the present invention is not also directed to any certain programmed language.It should be bright
In vain, various programming languages can be utilized to realize the content of invention described herein, and is retouched above to what language-specific was done
State is to disclose the preferred forms of the present invention.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention
Example can be put into practice without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of each inventive aspect,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:It is i.e. required to protect
Shield the present invention claims the more features of feature than being expressly recited in each claim.More precisely, such as right
As claim reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool
Thus claims of body embodiment are expressly incorporated in the specific implementation mode, wherein each claim conduct itself
The separate embodiments of the present invention.
Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment
Change and they are arranged in the one or more equipment different from the embodiment.It can be the module or list in embodiment
Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it may be used any
Combination is disclosed to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so to appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power
Profit requires, abstract and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of arbitrary
It mode can use in any combination.
The all parts embodiment of the present invention can be with hardware realization, or to run on one or more processors
Software module realize, or realized with combination thereof.It will be understood by those of skill in the art that can use in practice
Microprocessor or digital signal processor (DSP) come realize in Chines words processing scheme according to the ... of the embodiment of the present invention some or
The some or all functions of person's whole component.The present invention is also implemented as one for executing method as described herein
Divide either whole equipment or program of device (for example, computer program and computer program product).Such this hair of realization
Bright program can may be stored on the computer-readable medium, or can be with the form of one or more signal.It is such
Signal can be downloaded from internet website and be obtained, and either provided on carrier signal or provided in any other forms.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference mark between bracket should not be configured to limitations on claims.Word " comprising " does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be by the same hardware branch
To embody.The use of word first, second, and third does not indicate that any sequence.These words can be explained and be run after fame
Claim.
Claims (10)
1. a kind of Chinese character processing method, which is characterized in that the method includes:
Obtain at least one element that pending Chinese character includes, the element include the Chinese phonetic alphabet of pending Chinese character tone,
The stroke of pending Chinese character described in the simple or compound vowel of a Chinese syllable and composition that initial consonant that the Chinese phonetic alphabet includes, the Chinese phonetic alphabet include;
Determine each index position of the element in predicted elemental total collection;
Count occurrence number of each described element in the pending Chinese character;
The phonetic Hash vector of the pending Chinese character is generated according to the index position and the occurrence number;
Using phonetic Hash vector described in default embedded Processing with Neural Network, the continuous feature of the pending Chinese character is obtained.
2. according to the method described in claim 1, it is characterized in that, described give birth to according to the index position and the occurrence number
At the phonetic Hash vector of the pending Chinese character, including:
Generate the full null vector with dimensions such as the predicted elemental total collections;
For index position of each element in the predicted elemental total collection, determine the index position described complete zero
The corresponding numerical value of the dimension is updated to occurrence number of the element in the pending Chinese character by the dimension in vector,
Obtain the phonetic Hash vector of the pending Chinese character.
3. according to the method described in claim 1, it is characterized in that, the method further includes:
The all elements of each Chinese character in pre-set dictionary are obtained, the element includes the tone of the Chinese phonetic alphabet of Chinese character, the Chinese
The stroke of initial consonant, the simple or compound vowel of a Chinese syllable that the Chinese phonetic alphabet includes and composition Chinese character that language phonetic includes;
The all elements of each Chinese character are sought into union, obtain the predicted elemental total collection, wherein the predicted elemental always collects
Each element in conjunction has fixed index position.
4. according to the method described in claim 1, it is characterized in that, at least one element for obtaining pending Chinese character, packet
It includes:
Correspondence between the element for including according to pre-set Chinese character and Chinese character, determination are opposite with the pending Chinese character
The element answered, and the element for including as pending Chinese character.
5. a kind of Chinese character processing device, which is characterized in that described device includes:
First acquisition module, at least one element for including for obtaining pending Chinese character, the element includes pending Chinese character
The Chinese phonetic alphabet tone, the Chinese phonetic alphabet initial consonant, the Chinese phonetic alphabet that include include simple or compound vowel of a Chinese syllable and composition described in wait for
Handle the stroke of Chinese character;
Determining module, for determining each index position of the element in predicted elemental total collection;
Statistical module, for counting occurrence number of each described element in the pending Chinese character;
Generation module, for generated according to the index position and the occurrence number phonetic Hash of the pending Chinese character to
Amount;
Processing module, for using phonetic Hash vector described in default embedded Processing with Neural Network, obtaining the pending Chinese character
Continuous feature.
6. device according to claim 5, which is characterized in that the generation module includes:
Generation unit, for generating and the full null vector of the dimensions such as the predicted elemental total collection;
Determination unit determines the index for the index position for each element in the predicted elemental total collection
Dimension of the position in the full null vector, updating unit exist for the corresponding numerical value of the dimension to be updated to the element
Occurrence number in the pending Chinese character obtains the phonetic Hash vector of the pending Chinese character.
7. device according to claim 5, which is characterized in that described device further includes:
Second acquisition module, all elements for obtaining each Chinese character in pre-set dictionary, the element include Chinese character
The stroke of simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet include include;
Union module is sought, for all elements of each Chinese character to be sought union, obtains the predicted elemental total collection, wherein
Each element in the predicted elemental total collection has fixed index position.
8. device according to claim 5, which is characterized in that first acquisition module is specifically used for:According to setting in advance
Correspondence between the element that the Chinese character and Chinese character set include determines element corresponding with the pending Chinese character, and makees
The element for including for pending Chinese character.
9. a kind of terminal, which is characterized in that including:It memory, processor and is stored on the memory and can be at the place
The Chines words processing program run on reason device, realizes such as Claims 1-4 when the Chines words processing program is executed by the processor
Any one of described in Chinese character processing method the step of.
10. a kind of computer readable storage medium, which is characterized in that be stored at Chinese character on the computer readable storage medium
Program is managed, the Chines words processing program realizes Chines words processing according to any one of claims 1 to 4 when being executed by processor
The step of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810191423.4A CN108549627B (en) | 2018-03-08 | 2018-03-08 | Chinese character processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810191423.4A CN108549627B (en) | 2018-03-08 | 2018-03-08 | Chinese character processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108549627A true CN108549627A (en) | 2018-09-18 |
CN108549627B CN108549627B (en) | 2019-10-01 |
Family
ID=63516115
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810191423.4A Active CN108549627B (en) | 2018-03-08 | 2018-03-08 | Chinese character processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108549627B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111785249A (en) * | 2020-07-10 | 2020-10-16 | 恒信东方文化股份有限公司 | Training method, device and obtaining method of input phoneme of speech synthesis |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678272A (en) * | 2012-09-17 | 2014-03-26 | 北京信息科技大学 | Method for processing unknown words in Chinese-language dependency tree banks |
CN107609185A (en) * | 2017-09-30 | 2018-01-19 | 百度在线网络技术(北京)有限公司 | Method, apparatus, equipment and computer-readable recording medium for POI Similarity Measure |
-
2018
- 2018-03-08 CN CN201810191423.4A patent/CN108549627B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678272A (en) * | 2012-09-17 | 2014-03-26 | 北京信息科技大学 | Method for processing unknown words in Chinese-language dependency tree banks |
CN107609185A (en) * | 2017-09-30 | 2018-01-19 | 百度在线网络技术(北京)有限公司 | Method, apparatus, equipment and computer-readable recording medium for POI Similarity Measure |
Non-Patent Citations (3)
Title |
---|
何浩等: "一种基于N-Gram技术的中文文献自动分类方法", 《情报学报》 * |
王小华等: "基于N-Gram的文本去重方法研究", 《杭州电子科技大学学报》 * |
胡浩等: "基于汉字固有属性的中文字向量方法研究", 《中文信息学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111785249A (en) * | 2020-07-10 | 2020-10-16 | 恒信东方文化股份有限公司 | Training method, device and obtaining method of input phoneme of speech synthesis |
Also Published As
Publication number | Publication date |
---|---|
CN108549627B (en) | 2019-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108536669B (en) | Literal information processing method, device and terminal | |
CN108399409B (en) | Image classification method, device and terminal | |
CN108256549B (en) | Image classification method, device and terminal | |
CN108171254A (en) | Image tag determines method, apparatus and terminal | |
CN107102746A (en) | Candidate word generation method, device and the device generated for candidate word | |
CN107221330A (en) | Punctuate adding method and device, the device added for punctuate | |
CN107247519A (en) | A kind of input method and device | |
CN105528403B (en) | Target data identification method and device | |
CN108595497A (en) | Data screening method, apparatus and terminal | |
CN109871843A (en) | Character identifying method and device, the device for character recognition | |
CN108563683A (en) | Label addition method, device and terminal | |
CN111339737B (en) | Entity linking method, device, equipment and storage medium | |
JP7116088B2 (en) | Speech information processing method, device, program and recording medium | |
CN110390086A (en) | A kind of method, apparatus and storage medium generating text | |
CN107544684A (en) | A kind of candidate word display methods and device | |
CN107621886A (en) | Method, apparatus and electronic equipment are recommended in one kind input | |
CN108133217B (en) | Characteristics of image determines method, apparatus and terminal | |
CN109144285A (en) | A kind of input method and device | |
CN109558599A (en) | A kind of conversion method, device and electronic equipment | |
CN106886294A (en) | A kind of input method error correction method and device | |
CN107861637A (en) | Character input method, device and computer-readable recording medium | |
CN108573706A (en) | A kind of audio recognition method, device and equipment | |
CN108628461A (en) | A kind of input method and device, a kind of method and apparatus of update dictionary | |
CN108549627B (en) | Chinese character processing method and device | |
CN108984628A (en) | Content description generates the loss value-acquiring method and device of model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |