CN108549627A - Chinese character processing method and device - Google Patents

Chinese character processing method and device Download PDF

Info

Publication number
CN108549627A
CN108549627A CN201810191423.4A CN201810191423A CN108549627A CN 108549627 A CN108549627 A CN 108549627A CN 201810191423 A CN201810191423 A CN 201810191423A CN 108549627 A CN108549627 A CN 108549627A
Authority
CN
China
Prior art keywords
chinese character
chinese
pending
phonetic
phonetic alphabet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810191423.4A
Other languages
Chinese (zh)
Other versions
CN108549627B (en
Inventor
张志伟
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201810191423.4A priority Critical patent/CN108549627B/en
Publication of CN108549627A publication Critical patent/CN108549627A/en
Application granted granted Critical
Publication of CN108549627B publication Critical patent/CN108549627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Abstract

An embodiment of the present invention provides a kind of Chinese character processing method and devices.Obtain at least one element that pending Chinese character includes, which includes the stroke of simple or compound vowel of a Chinese syllable and the pending Chinese character of composition that the tone of the Chinese phonetic alphabet of pending Chinese character, the Chinese phonetic alphabet initial consonant, Chinese phonetic alphabet for including include;Determine index position of each element in predicted elemental total collection;Count occurrence number of each element in pending Chinese character;The phonetic Hash vector of pending Chinese character is generated according to the index position and the occurrence number;Utilize default embedded Processing with Neural Network phonetic Hash vector, you can to obtain the continuous feature of pending Chinese character.The present invention has good robustness for not appearing in the Chinese character in pre-set dictionary, in addition, due to phonetic hash space constant magnitude, so even increasing Chinese character in pre-set dictionary newly, it will not influence the overall construction of constructed phonetic hash space, the corresponding element of newly-increased Chinese character only need to be added, scalability is strong.

Description

Chinese character processing method and device
Technical field
The present invention relates to field of computer technology, more particularly to a kind of Chinese character processing method and device.
Background technology
Currently, deep learning is widely applied in related fields such as natural language processing, text translations.In the processing Chinese When word, need for discrete data as Chinese character to be converted to the continuous feature that can be input to depth network in most cases. The method being commonly used is One-hot Embedding, this kind of method is that the position by Chinese character in pre-set dictionary carries out Coding still has following two disadvantage although training deep neural network end-to-endly may be implemented in this method:
First, in internet environment, the Chinese character that general pre-set dictionary includes is very more, for characterizing Chinese character default The embeded matrix of position is especially huge in dictionary, if increasing Chinese character in pre-set dictionary newly, needs to re-create embeded matrix, to Lead to poor expandability.
Secondly it, when Chinese character to be treated does not appear in pre-set dictionary, will be unable to find this by the above method It position of the Chinese character in pre-set dictionary finally also should with regard to None- identified due to can not find position of the Chinese character in pre-set dictionary Chinese character.
Invention content
In order to solve the above technical problems, a kind of Chinese character processing method of offer of the embodiment of the present invention and device.
In a first aspect, the embodiment of the present invention shows a kind of Chinese character processing method, the method includes:
At least one element that pending Chinese character includes is obtained, the element includes the sound of the Chinese phonetic alphabet of pending Chinese character It adjusts, the stroke of pending Chinese character described in the simple or compound vowel of a Chinese syllable and composition that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
Determine each index position of the element in predicted elemental total collection;
Count occurrence number of each described element in the pending Chinese character;
The phonetic Hash vector of the pending Chinese character is generated according to the index position and the occurrence number;
Using phonetic Hash vector described in default embedded Processing with Neural Network, the continuous spy of the pending Chinese character is obtained Sign.
It is described to wait locating according to described in the index position and occurrence number generation in an optional realization method The phonetic Hash vector of Chinese character is managed, including:
Generate the full null vector with dimensions such as the predicted elemental total collections;
For index position of each element in the predicted elemental total collection, determine the index position described The corresponding numerical value of the dimension is updated to the element in the pending Chinese character and goes out occurrence by the dimension in full null vector Number, obtains the phonetic Hash vector of the pending Chinese character.
In an optional realization method, the method further includes:
The all elements of each Chinese character in pre-set dictionary are obtained, the element includes the sound of the Chinese phonetic alphabet of Chinese character It adjusts, the stroke for the simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
The all elements of each Chinese character are sought into union, obtain the predicted elemental total collection, wherein the predicted elemental Each element in total collection has fixed index position.
In an optional realization method, at least one element for obtaining pending Chinese character, including:
Correspondence between the element for including according to pre-set Chinese character and Chinese character determines and the pending Chinese character Corresponding element, and the element for including as pending Chinese character.
Second aspect, the embodiment of the present invention show that a kind of Chinese character processing device, described device include:
First acquisition module, at least one element for including for obtaining pending Chinese character, the element includes pending The simple or compound vowel of a Chinese syllable and composition institute that initial consonant that the tone of the Chinese phonetic alphabet of Chinese character, the Chinese phonetic alphabet include, the Chinese phonetic alphabet include State the stroke of pending Chinese character;
Determining module, for determining each index position of the element in predicted elemental total collection;
Statistical module, for counting occurrence number of each described element in the pending Chinese character;
Generation module, the phonetic for generating the pending Chinese character according to the index position and the occurrence number are breathed out Uncommon vector;
Processing module, for using phonetic Hash vector described in default embedded Processing with Neural Network, obtaining described pending The continuous feature of Chinese character.
In an optional realization method, the generation module includes:
Generation unit, for generating and the full null vector of the dimensions such as the predicted elemental total collection;
Determination unit, for index position for each element in the predicted elemental total collection, determine described in Dimension of the index position in the full null vector, updating unit, for the corresponding numerical value of the dimension to be updated to the member Occurrence number of the element in the pending Chinese character, obtains the phonetic Hash vector of the pending Chinese character.
In an optional realization method, described device further includes:
Second acquisition module, all elements for obtaining each Chinese character in pre-set dictionary, the element include the Chinese The stroke of simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet of word include include;
Union module is sought, for all elements of each Chinese character to be sought union, obtains the predicted elemental total collection, In, each element in the predicted elemental total collection has fixed index position.
In an optional realization method, first acquisition module is specifically used for:According to pre-set Chinese character with Correspondence between the element that Chinese character includes determines element corresponding with the pending Chinese character, and as the pending Chinese The element that word includes.
The third aspect, the embodiment of the present invention show a kind of terminal, including:Memory, processor and it is stored in described deposit On reservoir and the Chines words processing program that can run on the processor, when the Chines words processing program is executed by the processor The step of realizing Chinese character processing method as described in relation to the first aspect.
Fourth aspect, the embodiment of the present invention show a kind of computer readable storage medium, the computer-readable storage It is stored with Chines words processing program on medium, the Chinese as described in relation to the first aspect is realized when the Chines words processing program is executed by processor The step of word processing method.
Compared with prior art, the present invention has the following advantages:
In embodiments of the present invention, at least one element that pending Chinese character includes is obtained first, which includes waiting locating The simple or compound vowel of a Chinese syllable and composition that initial consonant, Chinese phonetic alphabet that tone, the Chinese phonetic alphabet for managing the Chinese phonetic alphabet of Chinese character include include wait locating Manage the stroke of Chinese character;Then index position of each element in predicted elemental total collection is determined;Each element is counted again Occurrence number in pending Chinese character;The phonetic that pending Chinese character is generated according to the index position and the occurrence number later is breathed out Uncommon vector;And utilize default embedded Processing with Neural Network phonetic Hash vector, you can to obtain the continuous spy of pending Chinese character Sign.Due to using the Chinese character in phonetic hash space characterization pre-set dictionary in the embodiment of the present invention, for not appearing in predetermined word Chinese character in allusion quotation has good robustness, further, since phonetic hash space constant magnitude, so even new in pre-set dictionary Increase Chinese character, the overall construction of constructed phonetic hash space will not be influenced, only need to add the corresponding element of newly-increased Chinese character i.e. Can, scalability is strong.
Above description is only the general introduction of technical solution of the present invention, in order to better understand the technical means of the present invention, And can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, below the special specific implementation mode for lifting the present invention.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, various advantages and benefit are for ordinary skill people Member will become clear.Attached drawing is only used for showing preferred embodiment, and is not considered as limitation of the present invention.And In entire attached drawing, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is a kind of step flow chart of Chinese character processing method embodiment shown in the present invention;
Fig. 2 is a kind of structure diagram of Chinese character processing device embodiment shown in the present invention;
Fig. 3 is a kind of structure diagram of terminal embodiment shown in the present invention.
Specific implementation mode
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.
Referring to Fig.1, show that a kind of step flow chart of Chinese character processing method embodiment of the present invention, this method specifically may be used To include the following steps:
In step S101, at least one element that pending Chinese character includes is obtained, which includes pending Chinese character The simple or compound vowel of a Chinese syllable and the pending Chinese character of composition that initial consonant that the tone of the Chinese phonetic alphabet, the Chinese phonetic alphabet include, the Chinese phonetic alphabet include Stroke;
In embodiments of the present invention, each Chinese character is corresponding with the Chinese phonetic alphabet, and the Chinese phonetic alphabet is by multiple phonetic alphabet Composition, phonetic alphabet include initial consonant and simple or compound vowel of a Chinese syllable, and each Chinese character is by a Chinese-character stroke, or by multiple and different or identical Chinese-character stroke is composed.
Pending Chinese character can be a Chinese character, or the word being made of multiple Chinese characters, in the embodiment of the present invention In, it is illustrated so that pending Chinese character is a Chinese character as an example, but not as limiting the scope of the invention.
In one embodiment, the Chinese phonetic alphabet of pending Chinese character and the tone of the Chinese phonetic alphabet can be obtained first, Then initial consonant is searched in the Chinese phonetic alphabet of pending Chinese character according to default initial consonant table, and, it is waiting locating according to default rhythm matrix It manages in the Chinese phonetic alphabet of Chinese character and searches simple or compound vowel of a Chinese syllable, then obtain the Chinese-character stroke for forming pending Chinese character.
It wherein, can be according to the Chinese of pre-set Chinese character and Chinese character when obtaining the Chinese phonetic alphabet of pending Chinese character Correspondence between phonetic obtains.It, can be according to pre-set when obtaining the tone of the Chinese phonetic alphabet of pending Chinese character Correspondence between the tone of the Chinese phonetic alphabet of Chinese character and Chinese character obtains.And obtaining the Chinese character for forming pending Chinese character It can be obtained according to the correspondence between pre-set Chinese character and the stroke of Chinese character when stroke.
In an alternative embodiment of the invention, for any one Chinese character in pre-set dictionary, the Chinese can also be obtained in advance The all elements that word includes, and all elements for including by the Chinese character and the Chinese character form corresponding table item, and be stored in and set in advance In correspondence between the element that the Chinese character and Chinese character set include, for other each Chinese characters in pre-set dictionary, equally So.
It therefore, in this step, can be directly according to corresponding between pre-set Chinese character and the element that Chinese character includes Relationship determines corresponding with pending Chinese character element, and the element for including as pending Chinese character, in this way, can be with The all elements that pending Chinese character includes once are can be obtained by so that searching, and then the pending Chinese character of acquisition can be improved and include Element efficiency.
In step s 102, index position of each element in predicted elemental total collection is determined;
In embodiments of the present invention, predicted elemental total collection is the element for each Chinese character for including according to pre-set dictionary What generation obtained arrives, and predicted elemental total collection includes multiple elements, for example, predicted elemental total collection includes for forming Chinese character All strokes, including all initial consonants, all simple or compound vowel of a Chinese syllable and all tones etc. in the Chinese phonetic alphabet, each stroke are a member Element, each initial consonant are an element, each simple or compound vowel of a Chinese syllable is an element, each tone is an element, each A element corresponding index position in predicted elemental total collection.
In embodiments of the present invention, predicted elemental total collection can be generated as follows in advance:
For example, obtaining all elements for each Chinese character that pre-set dictionary includes, element includes the Chinese phonetic alphabet of Chinese character Tone, the Chinese phonetic alphabet initial consonant, the Chinese phonetic alphabet that the include simple or compound vowel of a Chinese syllable and composition Chinese character that include stroke;By each Chinese character All elements seek union, obtain predicted elemental total collection, wherein each element in predicted elemental total collection has fixation Index position, and index position of the different elements in predicted elemental total collection is different.
For example, a certain element of pending Chinese character is " zh ", then index of the element in predicted elemental total collection is inquired Position, the i.e. element are particularly located at which row and which row in predicted elemental total collection.
In step s 103, occurrence number of each element in pending Chinese character is counted;
Alternatively, in an alternative embodiment of the invention, each element going out in predicted elemental total collection can also be counted Occurrence number.
In step S104, the phonetic Hash vector of pending Chinese character is generated according to the index position and the occurrence number;
Phonetic Hash vector includes multiple dimensions, and each dimension corresponds to an index position, and each index position corresponds to one A element.Determining occurrence number and its index position in predicted elemental total collection of certain element in pending Chinese character Afterwards, it may be determined that the corresponding dimension of the index position, and set the corresponding numerical value of the dimension to the occurrence number, for occurring Number is the corresponding a kind of dimension of the index position of 0 element, and the corresponding numerical value of such dimension is set as 0, obtains phonetic Hash Vector.
In an optional realization method, the full null vector with dimensions such as predicted elemental total collections can be generated, then For index position of any one element in predicted elemental total collection, dimension of the index position in full null vector is determined Degree, is updated to occurrence number of the element in pending Chinese character, for other each elements by the corresponding numerical value of the dimension Index position in predicted elemental total collection, equally executes aforesaid operations, to obtain the phonetic Hash of pending Chinese character to Amount.
In step S105, using default embedded Processing with Neural Network phonetic Hash vector, the company of pending Chinese character is obtained Continuous feature.
Using default embedded Processing with Neural Network phonetic Hash vector, the specific place of the continuous feature of pending Chinese character is obtained Reason mode is not particularly limited this in the embodiment of the present invention with reference to existing the relevant technologies.Obtaining pending Chinese character After continuous feature, analysis classification can be carried out to the semanteme of pending Chinese character according to the continuous feature.
In embodiments of the present invention, at least one element that pending Chinese character includes is obtained first, which includes waiting locating The simple or compound vowel of a Chinese syllable and composition that initial consonant, Chinese phonetic alphabet that tone, the Chinese phonetic alphabet for managing the Chinese phonetic alphabet of Chinese character include include wait locating Manage the stroke of Chinese character;Then index position of each element in predicted elemental total collection is determined;Each element is counted again Occurrence number in pending Chinese character;The phonetic that pending Chinese character is generated according to the index position and the occurrence number later is breathed out Uncommon vector;And utilize default embedded Processing with Neural Network phonetic Hash vector, you can to obtain the continuous spy of pending Chinese character Sign.Due to using the Chinese character in phonetic hash space characterization pre-set dictionary in the embodiment of the present invention, for not appearing in predetermined word Chinese character in allusion quotation has good robustness, further, since phonetic hash space constant magnitude, so even new in pre-set dictionary Increase Chinese character, the overall construction of constructed phonetic hash space will not be influenced, only need to add the corresponding element of newly-increased Chinese character i.e. Can, scalability is strong.
It should be noted that for embodiment of the method, for simple description, therefore it is all expressed as a series of action group It closes, but those skilled in the art should understand that, the embodiment of the present invention is not limited by the described action sequence, because according to According to the embodiment of the present invention, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to preferred embodiment, and the involved action not necessarily present invention is implemented Necessary to example.
With reference to Fig. 2, show that a kind of structure diagram of Chinese character processing device embodiment of the present invention, the device can specifically wrap Include following module:
First acquisition module 11, at least one element for including for obtaining pending Chinese character, the element include waiting locating The simple or compound vowel of a Chinese syllable and composition that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet for managing the Chinese phonetic alphabet of Chinese character include include The stroke of the pending Chinese character;
Determining module 12, for determining each index position of the element in predicted elemental total collection;
Statistical module 13, for counting occurrence number of each described element in the pending Chinese character;
Generation module 14, the phonetic for generating the pending Chinese character according to the index position and the occurrence number Hash vector;
Processing module 15 described waits locating for using phonetic Hash vector described in default embedded Processing with Neural Network, obtaining Manage the continuous feature of Chinese character.
In an optional realization method, the generation module 14 includes:
Generation unit, for generating and the full null vector of the dimensions such as the predicted elemental total collection;
Determination unit, for index position for each element in the predicted elemental total collection, determine described in Dimension of the index position in the full null vector, updating unit, for the corresponding numerical value of the dimension to be updated to the member Occurrence number of the element in the pending Chinese character, obtains the phonetic Hash vector of the pending Chinese character.
In an optional realization method, described device further includes:
Second acquisition module, all elements for obtaining each Chinese character in pre-set dictionary, the element include the Chinese The stroke of simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet of word include include;
Union module is sought, for all elements of each Chinese character to be sought union, obtains the predicted elemental total collection, In, each element in the predicted elemental total collection has fixed index position.
In an optional realization method, first acquisition module 11 is specifically used for:According to pre-set Chinese character Correspondence between the element for including with Chinese character determines element corresponding with the pending Chinese character, and as pending The element that Chinese character includes.
In embodiments of the present invention, at least one element that pending Chinese character includes is obtained first, which includes waiting locating The simple or compound vowel of a Chinese syllable and composition that initial consonant, Chinese phonetic alphabet that tone, the Chinese phonetic alphabet for managing the Chinese phonetic alphabet of Chinese character include include wait locating Manage the stroke of Chinese character;Then index position of each element in predicted elemental total collection is determined;Each element is counted again Occurrence number in pending Chinese character;The phonetic that pending Chinese character is generated according to the index position and the occurrence number later is breathed out Uncommon vector;And utilize default embedded Processing with Neural Network phonetic Hash vector, you can to obtain the continuous spy of pending Chinese character Sign.Due to using the Chinese character in phonetic hash space characterization pre-set dictionary in the embodiment of the present invention, for not appearing in predetermined word Chinese character in allusion quotation has good robustness, further, since phonetic hash space constant magnitude, so even new in pre-set dictionary Increase Chinese character, the overall construction of constructed phonetic hash space will not be influenced, only need to add the corresponding element of newly-increased Chinese character i.e. Can, scalability is strong.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description Place illustrates referring to the part of embodiment of the method.
The present invention also shows a kind of Chines words processing terminal, which may include:It memory, processor and is stored in On reservoir and the Chines words processing program that can run on a processor, realized in the present invention when Chines words processing program is executed by processor The step of any one described Chinese character processing method.
Fig. 3 is a kind of block diagram of Chines words processing terminal 600 shown according to an exemplary embodiment.For example, terminal 600 can To be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices are good for Body equipment, personal digital assistant etc..
With reference to Fig. 3, terminal 600 may include following one or more components:Processing component 602, memory 604, power supply Component 606, multimedia component 608, audio component 610, the interface 612 of input/output (I/O), sensor module 614, and Communication component 616.
The integrated operation of 602 usual control device 600 of processing component, such as with display, call, data communication, phase Machine operates and record operates associated operation.Processing component 602 may include that one or more processors 620 refer to execute It enables, to complete all or part of step of above-mentioned Chinese character processing method method.In addition, processing component 602 may include one or Multiple modules, convenient for the interaction between processing component 602 and other assemblies.For example, processing component 602 may include multimedia mould Block, to facilitate the interaction between multimedia component 608 and processing component 602.
Memory 604 is configured as storing various types of data to support the operation in terminal 600.These data are shown Example includes instruction for any application program or method that are operated in terminal 600, contact data, and telephone book data disappears Breath, picture, video etc..Memory 604 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Power supply module 606 provides electric power for the various assemblies of terminal 600.Power supply module 606 may include power management system System, one or more power supplys and other generated with for terminal 600, management and the associated component of distribution electric power.
Multimedia component 608 is included in the screen of one output interface of offer between the terminal 600 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 608 includes a front camera and/or rear camera.When terminal 600 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a Mike Wind (MIC), when terminal 600 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The received audio signal can be further stored in memory 604 or via communication set Part 616 is sent.In some embodiments, audio component 610 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 612 provide interface between processing component 602 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock Determine button.
Sensor module 614 includes one or more sensors, and the state for providing various aspects for terminal 600 is commented Estimate.For example, sensor module 614 can detect the state that opens/closes of terminal 600, and the relative positioning of component, for example, it is described Component is the display and keypad of terminal 600, and sensor module 614 can be with 600 1 components of detection terminal 600 or terminal Position change, the existence or non-existence that user contacts with terminal 600,600 orientation of device or acceleration/deceleration and terminal 600 Temperature change.Sensor module 614 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 614 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 616 is configured to facilitate the communication of wired or wireless way between terminal 600 and other equipment.Terminal 600 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation In example, communication component 616 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 616 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal 600 can be believed by one or more application application-specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing Chinese character processing method, specifically Ground, Chinese character processing method include:
At least one element that pending Chinese character includes is obtained, the element includes the sound of the Chinese phonetic alphabet of pending Chinese character It adjusts, the stroke of pending Chinese character described in the simple or compound vowel of a Chinese syllable and composition that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
Determine each index position of the element in predicted elemental total collection;
Count occurrence number of each described element in the pending Chinese character;
The phonetic Hash vector of the pending Chinese character is generated according to the index position and the occurrence number;
Using phonetic Hash vector described in default embedded Processing with Neural Network, the continuous spy of the pending Chinese character is obtained Sign.
It is described to wait locating according to described in the index position and occurrence number generation in an optional realization method The phonetic Hash vector of Chinese character is managed, including:
Generate the full null vector with dimensions such as the predicted elemental total collections;
For index position of each element in the predicted elemental total collection, determine the index position described The corresponding numerical value of the dimension is updated to the element in the pending Chinese character and goes out occurrence by the dimension in full null vector Number, obtains the phonetic Hash vector of the pending Chinese character.
In an optional realization method, the method further includes:
The all elements of each Chinese character in pre-set dictionary are obtained, the element includes the sound of the Chinese phonetic alphabet of Chinese character It adjusts, the stroke for the simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
The all elements of each Chinese character are sought into union, obtain the predicted elemental total collection, wherein the predicted elemental Each element in total collection has fixed index position.
In an optional realization method, at least one element for obtaining pending Chinese character, including:According to advance Correspondence between the element that the Chinese character and Chinese character of setting include determines element corresponding with the pending Chinese character, and The element for including as pending Chinese character.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of Such as include the memory 604 of instruction, above-metioned instruction can be executed by the processor 620 of terminal 600 to complete above-mentioned Chines words processing side Method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, magnetic Band, floppy disk and optical data storage devices etc..When the instruction in storage medium is executed by the processor of terminal so that terminal can The step of executing any one heretofore described Chinese character processing method.
Provided herein Chines words processing scheme not with the intrinsic phase of any certain computer, virtual system or miscellaneous equipment It closes.Various general-purpose systems can also be used together with teaching based on this.As described above, construction has present invention side Structure required by the system of case is obvious.In addition, the present invention is not also directed to any certain programmed language.It should be bright In vain, various programming languages can be utilized to realize the content of invention described herein, and is retouched above to what language-specific was done State is to disclose the preferred forms of the present invention.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:It is i.e. required to protect Shield the present invention claims the more features of feature than being expressly recited in each claim.More precisely, such as right As claim reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool Thus claims of body embodiment are expressly incorporated in the specific implementation mode, wherein each claim conduct itself The separate embodiments of the present invention.
Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in the one or more equipment different from the embodiment.It can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it may be used any Combination is disclosed to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, abstract and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of arbitrary It mode can use in any combination.
The all parts embodiment of the present invention can be with hardware realization, or to run on one or more processors Software module realize, or realized with combination thereof.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) come realize in Chines words processing scheme according to the ... of the embodiment of the present invention some or The some or all functions of person's whole component.The present invention is also implemented as one for executing method as described herein Divide either whole equipment or program of device (for example, computer program and computer program product).Such this hair of realization Bright program can may be stored on the computer-readable medium, or can be with the form of one or more signal.It is such Signal can be downloaded from internet website and be obtained, and either provided on carrier signal or provided in any other forms.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference mark between bracket should not be configured to limitations on claims.Word " comprising " does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be by the same hardware branch To embody.The use of word first, second, and third does not indicate that any sequence.These words can be explained and be run after fame Claim.

Claims (10)

1. a kind of Chinese character processing method, which is characterized in that the method includes:
Obtain at least one element that pending Chinese character includes, the element include the Chinese phonetic alphabet of pending Chinese character tone, The stroke of pending Chinese character described in the simple or compound vowel of a Chinese syllable and composition that initial consonant that the Chinese phonetic alphabet includes, the Chinese phonetic alphabet include;
Determine each index position of the element in predicted elemental total collection;
Count occurrence number of each described element in the pending Chinese character;
The phonetic Hash vector of the pending Chinese character is generated according to the index position and the occurrence number;
Using phonetic Hash vector described in default embedded Processing with Neural Network, the continuous feature of the pending Chinese character is obtained.
2. according to the method described in claim 1, it is characterized in that, described give birth to according to the index position and the occurrence number At the phonetic Hash vector of the pending Chinese character, including:
Generate the full null vector with dimensions such as the predicted elemental total collections;
For index position of each element in the predicted elemental total collection, determine the index position described complete zero The corresponding numerical value of the dimension is updated to occurrence number of the element in the pending Chinese character by the dimension in vector, Obtain the phonetic Hash vector of the pending Chinese character.
3. according to the method described in claim 1, it is characterized in that, the method further includes:
The all elements of each Chinese character in pre-set dictionary are obtained, the element includes the tone of the Chinese phonetic alphabet of Chinese character, the Chinese The stroke of initial consonant, the simple or compound vowel of a Chinese syllable that the Chinese phonetic alphabet includes and composition Chinese character that language phonetic includes;
The all elements of each Chinese character are sought into union, obtain the predicted elemental total collection, wherein the predicted elemental always collects Each element in conjunction has fixed index position.
4. according to the method described in claim 1, it is characterized in that, at least one element for obtaining pending Chinese character, packet It includes:
Correspondence between the element for including according to pre-set Chinese character and Chinese character, determination are opposite with the pending Chinese character The element answered, and the element for including as pending Chinese character.
5. a kind of Chinese character processing device, which is characterized in that described device includes:
First acquisition module, at least one element for including for obtaining pending Chinese character, the element includes pending Chinese character The Chinese phonetic alphabet tone, the Chinese phonetic alphabet initial consonant, the Chinese phonetic alphabet that include include simple or compound vowel of a Chinese syllable and composition described in wait for Handle the stroke of Chinese character;
Determining module, for determining each index position of the element in predicted elemental total collection;
Statistical module, for counting occurrence number of each described element in the pending Chinese character;
Generation module, for generated according to the index position and the occurrence number phonetic Hash of the pending Chinese character to Amount;
Processing module, for using phonetic Hash vector described in default embedded Processing with Neural Network, obtaining the pending Chinese character Continuous feature.
6. device according to claim 5, which is characterized in that the generation module includes:
Generation unit, for generating and the full null vector of the dimensions such as the predicted elemental total collection;
Determination unit determines the index for the index position for each element in the predicted elemental total collection Dimension of the position in the full null vector, updating unit exist for the corresponding numerical value of the dimension to be updated to the element Occurrence number in the pending Chinese character obtains the phonetic Hash vector of the pending Chinese character.
7. device according to claim 5, which is characterized in that described device further includes:
Second acquisition module, all elements for obtaining each Chinese character in pre-set dictionary, the element include Chinese character The stroke of simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet include include;
Union module is sought, for all elements of each Chinese character to be sought union, obtains the predicted elemental total collection, wherein Each element in the predicted elemental total collection has fixed index position.
8. device according to claim 5, which is characterized in that first acquisition module is specifically used for:According to setting in advance Correspondence between the element that the Chinese character and Chinese character set include determines element corresponding with the pending Chinese character, and makees The element for including for pending Chinese character.
9. a kind of terminal, which is characterized in that including:It memory, processor and is stored on the memory and can be at the place The Chines words processing program run on reason device, realizes such as Claims 1-4 when the Chines words processing program is executed by the processor Any one of described in Chinese character processing method the step of.
10. a kind of computer readable storage medium, which is characterized in that be stored at Chinese character on the computer readable storage medium Program is managed, the Chines words processing program realizes Chines words processing according to any one of claims 1 to 4 when being executed by processor The step of method.
CN201810191423.4A 2018-03-08 2018-03-08 Chinese character processing method and device Active CN108549627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810191423.4A CN108549627B (en) 2018-03-08 2018-03-08 Chinese character processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810191423.4A CN108549627B (en) 2018-03-08 2018-03-08 Chinese character processing method and device

Publications (2)

Publication Number Publication Date
CN108549627A true CN108549627A (en) 2018-09-18
CN108549627B CN108549627B (en) 2019-10-01

Family

ID=63516115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810191423.4A Active CN108549627B (en) 2018-03-08 2018-03-08 Chinese character processing method and device

Country Status (1)

Country Link
CN (1) CN108549627B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111785249A (en) * 2020-07-10 2020-10-16 恒信东方文化股份有限公司 Training method, device and obtaining method of input phoneme of speech synthesis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678272A (en) * 2012-09-17 2014-03-26 北京信息科技大学 Method for processing unknown words in Chinese-language dependency tree banks
CN107609185A (en) * 2017-09-30 2018-01-19 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and computer-readable recording medium for POI Similarity Measure

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678272A (en) * 2012-09-17 2014-03-26 北京信息科技大学 Method for processing unknown words in Chinese-language dependency tree banks
CN107609185A (en) * 2017-09-30 2018-01-19 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and computer-readable recording medium for POI Similarity Measure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何浩等: "一种基于N-Gram技术的中文文献自动分类方法", 《情报学报》 *
王小华等: "基于N-Gram的文本去重方法研究", 《杭州电子科技大学学报》 *
胡浩等: "基于汉字固有属性的中文字向量方法研究", 《中文信息学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111785249A (en) * 2020-07-10 2020-10-16 恒信东方文化股份有限公司 Training method, device and obtaining method of input phoneme of speech synthesis

Also Published As

Publication number Publication date
CN108549627B (en) 2019-10-01

Similar Documents

Publication Publication Date Title
CN108536669B (en) Literal information processing method, device and terminal
CN108399409B (en) Image classification method, device and terminal
CN108256549B (en) Image classification method, device and terminal
CN108171254A (en) Image tag determines method, apparatus and terminal
CN107102746A (en) Candidate word generation method, device and the device generated for candidate word
CN107221330A (en) Punctuate adding method and device, the device added for punctuate
CN107247519A (en) A kind of input method and device
CN105528403B (en) Target data identification method and device
CN108595497A (en) Data screening method, apparatus and terminal
CN109871843A (en) Character identifying method and device, the device for character recognition
CN108563683A (en) Label addition method, device and terminal
CN111339737B (en) Entity linking method, device, equipment and storage medium
JP7116088B2 (en) Speech information processing method, device, program and recording medium
CN110390086A (en) A kind of method, apparatus and storage medium generating text
CN107544684A (en) A kind of candidate word display methods and device
CN107621886A (en) Method, apparatus and electronic equipment are recommended in one kind input
CN108133217B (en) Characteristics of image determines method, apparatus and terminal
CN109144285A (en) A kind of input method and device
CN109558599A (en) A kind of conversion method, device and electronic equipment
CN106886294A (en) A kind of input method error correction method and device
CN107861637A (en) Character input method, device and computer-readable recording medium
CN108573706A (en) A kind of audio recognition method, device and equipment
CN108628461A (en) A kind of input method and device, a kind of method and apparatus of update dictionary
CN108549627B (en) Chinese character processing method and device
CN108984628A (en) Content description generates the loss value-acquiring method and device of model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant