CN108549627B - Chinese character processing method and device - Google Patents

Chinese character processing method and device Download PDF

Info

Publication number
CN108549627B
CN108549627B CN201810191423.4A CN201810191423A CN108549627B CN 108549627 B CN108549627 B CN 108549627B CN 201810191423 A CN201810191423 A CN 201810191423A CN 108549627 B CN108549627 B CN 108549627B
Authority
CN
China
Prior art keywords
chinese character
chinese
processed
phonetic
phonetic alphabet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810191423.4A
Other languages
Chinese (zh)
Other versions
CN108549627A (en
Inventor
张志伟
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201810191423.4A priority Critical patent/CN108549627B/en
Publication of CN108549627A publication Critical patent/CN108549627A/en
Application granted granted Critical
Publication of CN108549627B publication Critical patent/CN108549627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The embodiment of the invention provides a kind of Chinese character processing method and devices.Obtain at least one element that Chinese character to be processed includes, which includes the stroke of simple or compound vowel of a Chinese syllable and composition Chinese character to be processed that the tone of the Chinese phonetic alphabet of Chinese character to be processed, the Chinese phonetic alphabet initial consonant, Chinese phonetic alphabet for including include;Determine index position of each element in predicted elemental total collection;Count frequency of occurrence of each element in Chinese character to be processed;The phonetic Hash vector of Chinese character to be processed is generated according to the index position and the frequency of occurrence;Utilize default insertion Processing with Neural Network phonetic Hash vector, it can obtain the continuous feature of Chinese character to be processed.The present invention has good robustness the Chinese character not appeared in pre-set dictionary, furthermore, due to phonetic hash space constant magnitude, so even increasing Chinese character in pre-set dictionary newly, it will not influence the overall construction of constructed phonetic hash space, the corresponding element of newly-increased Chinese character only need to be added, scalability is strong.

Description

Chinese character processing method and device
Technical field
The present invention relates to field of computer technology, more particularly to a kind of Chinese character processing method and device.
Background technique
Currently, deep learning is widely applied in related fieldss such as natural language processing, text translations.In the processing Chinese When word, need for discrete data as Chinese character to be converted to the continuous feature that can be input to depth network in most cases. The method being commonly used is One-hot Embedding, this kind of method is that the position by Chinese character in pre-set dictionary carries out Coding still has following two disadvantage although training deep neural network end-to-endly may be implemented in this method:
First, in internet environment, the Chinese character for including in general pre-set dictionary is very more, for characterizing Chinese character default The embeded matrix of position is especially huge in dictionary, if increasing Chinese character in pre-set dictionary newly, needs to re-create embeded matrix, thus Lead to poor expandability.
Secondly, when Chinese character to be treated does not appear in pre-set dictionary, will be unable to find this by the above method Position of the Chinese character in pre-set dictionary finally also can not just identify this due to can not find position of the Chinese character in pre-set dictionary Chinese character.
Summary of the invention
In order to solve the above technical problems, the embodiment of the present invention provides a kind of Chinese character processing method and device.
In a first aspect, the embodiment of the present invention shows a kind of Chinese character processing method, which comprises
At least one element that Chinese character to be processed includes is obtained, the element includes the sound of the Chinese phonetic alphabet of Chinese character to be processed It adjusts, the stroke of Chinese character to be processed described in the simple or compound vowel of a Chinese syllable and composition that the Chinese phonetic alphabet initial consonant, the Chinese phonetic alphabet that include includes;
Determine each index position of the element in predicted elemental total collection;
Count frequency of occurrence of each described element in the Chinese character to be processed;
The phonetic Hash vector of the Chinese character to be processed is generated according to the index position and the frequency of occurrence;
Using phonetic Hash vector described in default insertion Processing with Neural Network, the continuous spy of the Chinese character to be processed is obtained Sign.
It is described described wait locate according to the index position and frequency of occurrence generation in an optional implementation Manage the phonetic Hash vector of Chinese character, comprising:
Generate the full null vector with dimensions such as the predicted elemental total collections;
For index position of each element in the predicted elemental total collection, determine the index position described The corresponding numerical value of the dimension is updated to out occurrence of the element in the Chinese character to be processed by the dimension in full null vector Number, obtains the phonetic Hash vector of the Chinese character to be processed.
In an optional implementation, the method also includes:
The all elements of each of pre-set dictionary Chinese character are obtained, the element includes the sound of the Chinese phonetic alphabet of Chinese character It adjusts, the stroke for the simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
The all elements of each Chinese character are sought into union, obtain the predicted elemental total collection, wherein the predicted elemental Each of total collection element all has fixed index position.
In an optional implementation, described at least one element for obtaining Chinese character to be processed, comprising:
The corresponding relationship between element for including according to pre-set Chinese character and Chinese character, the determining and Chinese character to be processed Corresponding element, and the element for including as Chinese character to be processed.
Second aspect, the embodiment of the present invention show a kind of Chinese character processing device, and described device includes:
First obtains module, at least one element for including for obtaining Chinese character to be processed, the element includes to be processed The simple or compound vowel of a Chinese syllable and composition institute that initial consonant that the tone of the Chinese phonetic alphabet of Chinese character, the Chinese phonetic alphabet include, the Chinese phonetic alphabet include State the stroke of Chinese character to be processed;
Determining module, for determining each index position of the element in predicted elemental total collection;
Statistical module, for counting frequency of occurrence of each described element in the Chinese character to be processed;
Generation module, the phonetic for generating the Chinese character to be processed according to the index position and the frequency of occurrence are breathed out Uncommon vector;
Processing module, for obtaining described to be processed using phonetic Hash vector described in default insertion Processing with Neural Network The continuous feature of Chinese character.
In an optional implementation, the generation module includes:
Generation unit, for generating and the full null vector of the dimensions such as the predicted elemental total collection;
Determination unit, for the index position for each element in the predicted elemental total collection, determine described in Dimension of the index position in the full null vector, updating unit, for the corresponding numerical value of the dimension to be updated to the member Frequency of occurrence of the element in the Chinese character to be processed, obtains the phonetic Hash vector of the Chinese character to be processed.
In an optional implementation, described device further include:
Second obtains module, and for obtaining all elements of each of pre-set dictionary Chinese character, the element includes the Chinese The stroke of simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet of word include include;
Union module is sought, for all elements of each Chinese character to be sought union, obtains the predicted elemental total collection, In, each of described predicted elemental total collection element all has fixed index position.
In an optional implementation, it is described first acquisition module is specifically used for: according to pre-set Chinese character with The corresponding relationship between element that Chinese character includes, determining element corresponding with the Chinese character to be processed, and as the Chinese to be processed The element that word includes.
The third aspect, the embodiment of the present invention show a kind of terminal, comprising: memory, processor and are stored in described deposit On reservoir and the Chines words processing program that can run on the processor, when the Chines words processing program is executed by the processor The step of realizing Chinese character processing method as described in relation to the first aspect.
Fourth aspect, the embodiment of the present invention show a kind of computer readable storage medium, the computer-readable storage It is stored with Chines words processing program on medium, the Chinese as described in relation to the first aspect is realized when the Chines words processing program is executed by processor The step of word processing method.
Compared with prior art, the invention has the following advantages that
In embodiments of the present invention, at least one element that Chinese character to be processed includes is obtained first, which includes wait locate The simple or compound vowel of a Chinese syllable and composition that initial consonant, Chinese phonetic alphabet that tone, Chinese phonetic alphabet of the Chinese phonetic alphabet of reason Chinese character include include wait locating Manage the stroke of Chinese character;Then index position of each element in predicted elemental total collection is determined;Each element is counted again Frequency of occurrence in Chinese character to be processed;It is breathed out later according to the phonetic that the index position and the frequency of occurrence generate Chinese character to be processed Uncommon vector;And utilize default insertion Processing with Neural Network phonetic Hash vector, it can obtain the continuous spy of Chinese character to be processed Sign.Due to using the Chinese character in phonetic hash space characterization pre-set dictionary in the embodiment of the present invention, for not appearing in predetermined word Chinese character in allusion quotation has good robustness, further, since phonetic hash space constant magnitude, so even new in pre-set dictionary Increase Chinese character, the overall construction of constructed phonetic hash space will not be influenced, only need to add the corresponding element of newly-increased Chinese character i.e. Can, scalability is strong.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various advantage and benefit are for ordinary skill people Member will become clear.Attached drawing is only used for showing preferred embodiment, and is not to be construed as limiting the invention.And In entire attached drawing, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is a kind of step flow chart of Chinese character processing method embodiment shown in the present invention;
Fig. 2 is a kind of structural block diagram of Chinese character processing device embodiment shown in the present invention;
Fig. 3 is a kind of structural block diagram of terminal embodiment shown in the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Referring to Fig.1, a kind of step flow chart of Chinese character processing method embodiment of the invention is shown, this method specifically may be used To include the following steps:
In step s101, at least one element that Chinese character to be processed includes is obtained, which includes Chinese character to be processed The simple or compound vowel of a Chinese syllable and composition Chinese character to be processed that initial consonant that the tone of the Chinese phonetic alphabet, the Chinese phonetic alphabet include, the Chinese phonetic alphabet include Stroke;
In embodiments of the present invention, each Chinese character is corresponding with the Chinese phonetic alphabet, and the Chinese phonetic alphabet is by multiple phonetic alphabet Composition, phonetic alphabet include initial consonant and simple or compound vowel of a Chinese syllable, and each Chinese character is by a Chinese-character stroke, or by multiple and different or identical Chinese-character stroke is composed.
Chinese character to be processed can be a Chinese character, or the word being made of multiple Chinese characters, in the embodiment of the present invention In, it is illustrated so that Chinese character to be processed is a Chinese character as an example, but not as limiting the scope of the invention.
In one embodiment, the Chinese phonetic alphabet of Chinese character to be processed and the tone of the Chinese phonetic alphabet can be obtained first, Then initial consonant is searched in the Chinese phonetic alphabet of Chinese character to be processed according to default initial consonant table, and, according to default rhythm matrix wait locate It manages in the Chinese phonetic alphabet of Chinese character and searches simple or compound vowel of a Chinese syllable, then obtain the Chinese-character stroke for forming Chinese character to be processed.
It wherein, can be according to the Chinese of pre-set Chinese character and Chinese character when obtaining the Chinese phonetic alphabet of Chinese character to be processed Corresponding relationship between phonetic obtains.It, can be according to pre-set when obtaining the tone of the Chinese phonetic alphabet of Chinese character to be processed Corresponding relationship between the tone of the Chinese phonetic alphabet of Chinese character and Chinese character obtains.And obtaining the Chinese character for forming Chinese character to be processed It can be obtained according to the corresponding relationship between pre-set Chinese character and the stroke of Chinese character when stroke.
In an alternative embodiment of the invention, for any one Chinese character in pre-set dictionary, the Chinese can also be obtained in advance The all elements that word includes, and all elements for including by the Chinese character and the Chinese character form corresponding table item, and are stored in and set in advance In the corresponding relationship between element that the Chinese character and Chinese character set include, for other each Chinese characters in pre-set dictionary, equally So.
It therefore, in this step, can be directly according to corresponding between pre-set Chinese character and the element that Chinese character includes Relationship determines corresponding with Chinese character to be processed element, and the element for including as Chinese character to be processed, in this way, can be with It once can be obtained by all elements that Chinese character to be processed includes so that searching, and then acquisition Chinese character to be processed can be improved and include Element efficiency.
In step s 102, index position of each element in predicted elemental total collection is determined;
In embodiments of the present invention, predicted elemental total collection is the element according to each Chinese character for including in pre-set dictionary What generation obtained arrives, and predicted elemental total collection includes multiple elements, for example, predicted elemental total collection includes for forming Chinese character All strokes, including all initial consonants, all simple or compound vowel of a Chinese syllable and all tones etc. in the Chinese phonetic alphabet, each stroke is a member Element, each initial consonant are an element, each simple or compound vowel of a Chinese syllable is an element, each tone is an element, each A element corresponding index position in predicted elemental total collection.
In embodiments of the present invention, predicted elemental total collection can be generated as follows in advance:
For example, obtaining all elements for each Chinese character for including in pre-set dictionary, element includes the Chinese phonetic alphabet of Chinese character Tone, the Chinese phonetic alphabet initial consonant, the Chinese phonetic alphabet that the include simple or compound vowel of a Chinese syllable and composition Chinese character that include stroke;By each Chinese character All elements seek union, obtain predicted elemental total collection, wherein each of predicted elemental total collection element all has fixation Index position, and index position of the different elements in predicted elemental total collection is different.
For example, a certain element of Chinese character to be processed is " zh ", then index of the element in predicted elemental total collection is inquired Position, the i.e. element are particularly located at which row and which column in predicted elemental total collection.
In step s 103, frequency of occurrence of each element in Chinese character to be processed is counted;
Alternatively, in an alternative embodiment of the invention, each element going out in predicted elemental total collection can also be counted Occurrence number.
In step S104, the phonetic Hash vector of Chinese character to be processed is generated according to the index position and the frequency of occurrence;
Phonetic Hash vector includes multiple dimensions, the corresponding index position of each dimension, each index position corresponding one A element.Determining certain element in the frequency of occurrence and its index position in predicted elemental total collection in Chinese character to be processed Afterwards, it can determine the corresponding dimension of the index position, and set the frequency of occurrence for the corresponding numerical value of the dimension, for occurring Number is the corresponding a kind of dimension of the index position of 0 element, sets 0 for the corresponding numerical value of such dimension, obtains phonetic Hash Vector.
In an optional implementation, the full null vector with dimensions such as predicted elemental total collections can be generated, then For index position of any one element in predicted elemental total collection, dimension of the index position in full null vector is determined Degree, is updated to frequency of occurrence of the element in Chinese character to be processed for the corresponding numerical value of the dimension, for other each elements Index position in predicted elemental total collection, equally execution aforesaid operations, thus obtain the phonetic Hash of Chinese character to be processed to Amount.
In step s105, using default insertion Processing with Neural Network phonetic Hash vector, the company of Chinese character to be processed is obtained Continuous feature.
Using default insertion Processing with Neural Network phonetic Hash vector, the specific place of the continuous feature of Chinese character to be processed is obtained Reason mode is not particularly limited this in the embodiment of the present invention referring to existing the relevant technologies.Obtaining Chinese character to be processed After continuous feature, analysis classification can be carried out according to semanteme of the continuous feature to Chinese character to be processed.
In embodiments of the present invention, at least one element that Chinese character to be processed includes is obtained first, which includes wait locate The simple or compound vowel of a Chinese syllable and composition that initial consonant, Chinese phonetic alphabet that tone, Chinese phonetic alphabet of the Chinese phonetic alphabet of reason Chinese character include include wait locating Manage the stroke of Chinese character;Then index position of each element in predicted elemental total collection is determined;Each element is counted again Frequency of occurrence in Chinese character to be processed;It is breathed out later according to the phonetic that the index position and the frequency of occurrence generate Chinese character to be processed Uncommon vector;And utilize default insertion Processing with Neural Network phonetic Hash vector, it can obtain the continuous spy of Chinese character to be processed Sign.Due to using the Chinese character in phonetic hash space characterization pre-set dictionary in the embodiment of the present invention, for not appearing in predetermined word Chinese character in allusion quotation has good robustness, further, since phonetic hash space constant magnitude, so even new in pre-set dictionary Increase Chinese character, the overall construction of constructed phonetic hash space will not be influenced, only need to add the corresponding element of newly-increased Chinese character i.e. Can, scalability is strong.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.
Referring to Fig. 2, a kind of structural block diagram of Chinese character processing device embodiment of the present invention is shown, which specifically can wrap Include following module:
First obtains module 11, at least one element for including for obtaining Chinese character to be processed, the element includes wait locate The simple or compound vowel of a Chinese syllable and composition that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet of reason Chinese character include include The stroke of the Chinese character to be processed;
Determining module 12, for determining each index position of the element in predicted elemental total collection;
Statistical module 13, for counting frequency of occurrence of each described element in the Chinese character to be processed;
Generation module 14, for generating the phonetic of the Chinese character to be processed according to the index position and the frequency of occurrence Hash vector;
Processing module 15, for obtaining described wait locate using phonetic Hash vector described in default insertion Processing with Neural Network Manage the continuous feature of Chinese character.
In an optional implementation, the generation module 14 includes:
Generation unit, for generating and the full null vector of the dimensions such as the predicted elemental total collection;
Determination unit, for the index position for each element in the predicted elemental total collection, determine described in Dimension of the index position in the full null vector, updating unit, for the corresponding numerical value of the dimension to be updated to the member Frequency of occurrence of the element in the Chinese character to be processed, obtains the phonetic Hash vector of the Chinese character to be processed.
In an optional implementation, described device further include:
Second obtains module, and for obtaining all elements of each of pre-set dictionary Chinese character, the element includes the Chinese The stroke of simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet of word include include;
Union module is sought, for all elements of each Chinese character to be sought union, obtains the predicted elemental total collection, In, each of described predicted elemental total collection element all has fixed index position.
In an optional implementation, the first acquisition module 11 is specifically used for: according to pre-set Chinese character The corresponding relationship between element for including with Chinese character, determining element corresponding with the Chinese character to be processed, and as to be processed The element that Chinese character includes.
In embodiments of the present invention, at least one element that Chinese character to be processed includes is obtained first, which includes wait locate The simple or compound vowel of a Chinese syllable and composition that initial consonant, Chinese phonetic alphabet that tone, Chinese phonetic alphabet of the Chinese phonetic alphabet of reason Chinese character include include wait locating Manage the stroke of Chinese character;Then index position of each element in predicted elemental total collection is determined;Each element is counted again Frequency of occurrence in Chinese character to be processed;It is breathed out later according to the phonetic that the index position and the frequency of occurrence generate Chinese character to be processed Uncommon vector;And utilize default insertion Processing with Neural Network phonetic Hash vector, it can obtain the continuous spy of Chinese character to be processed Sign.Due to using the Chinese character in phonetic hash space characterization pre-set dictionary in the embodiment of the present invention, for not appearing in predetermined word Chinese character in allusion quotation has good robustness, further, since phonetic hash space constant magnitude, so even new in pre-set dictionary Increase Chinese character, the overall construction of constructed phonetic hash space will not be influenced, only need to add the corresponding element of newly-increased Chinese character i.e. Can, scalability is strong.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.
The present invention also shows a kind of Chines words processing terminal, which may include: memory, processor and be stored in On reservoir and the Chines words processing program that can run on a processor, realized in the present invention when Chines words processing program is executed by processor The step of any one described Chinese character processing method.
Fig. 3 is a kind of block diagram of Chines words processing terminal 600 shown according to an exemplary embodiment.For example, terminal 600 can To be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices are good for Body equipment, personal digital assistant etc..
Referring to Fig. 3, terminal 600 may include following one or more components: processing component 602, memory 604, power supply Component 606, multimedia component 608, audio component 610, the interface 612 of input/output (I/O), sensor module 614, and Communication component 616.
The integrated operation of the usual control device 600 of processing component 602, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing component 602 may include that one or more processors 620 refer to execute It enables, to complete all or part of the steps of above-mentioned Chinese character processing method method.In addition, processing component 602 may include one or Multiple modules, convenient for the interaction between processing component 602 and other assemblies.For example, processing component 602 may include multimedia mould Block, to facilitate the interaction between multimedia component 608 and processing component 602.
Memory 604 is configured as storing various types of data to support the operation in terminal 600.These data are shown Example includes the instruction of any application or method for operating in terminal 600, contact data, and telephone book data disappears Breath, picture, video etc..Memory 604 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Power supply module 606 provides electric power for the various assemblies of terminal 600.Power supply module 606 may include power management system System, one or more power supplys and other with for terminal 600 generate, manage, and distribute the associated component of electric power.
Multimedia component 608 includes the screen of one output interface of offer between the terminal 600 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 608 includes a front camera and/or rear camera.When terminal 600 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a Mike Wind (MIC), when terminal 600 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched It is set to reception external audio signal.The received audio signal can be further stored in memory 604 or via communication set Part 616 is sent.In some embodiments, audio component 610 further includes a loudspeaker, is used for output audio signal.
I/O interface 612 provides interface between processing component 602 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.
Sensor module 614 includes one or more sensors, and the state for providing various aspects for terminal 600 is commented Estimate.For example, sensor module 614 can detecte the state that opens/closes of terminal 600, and the relative positioning of component, for example, it is described Component is the display and keypad of terminal 600, and sensor module 614 can also detect 600 1 components of terminal 600 or terminal Position change, the existence or non-existence that user contacts with terminal 600,600 orientation of device or acceleration/deceleration and terminal 600 Temperature change.Sensor module 614 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 614 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 616 is configured to facilitate the communication of wired or wireless way between terminal 600 and other equipment.Terminal 600 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 616 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 616 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal 600 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing Chinese character processing method, specifically Ground, Chinese character processing method include:
At least one element that Chinese character to be processed includes is obtained, the element includes the sound of the Chinese phonetic alphabet of Chinese character to be processed It adjusts, the stroke of Chinese character to be processed described in the simple or compound vowel of a Chinese syllable and composition that the Chinese phonetic alphabet initial consonant, the Chinese phonetic alphabet that include includes;
Determine each index position of the element in predicted elemental total collection;
Count frequency of occurrence of each described element in the Chinese character to be processed;
The phonetic Hash vector of the Chinese character to be processed is generated according to the index position and the frequency of occurrence;
Using phonetic Hash vector described in default insertion Processing with Neural Network, the continuous spy of the Chinese character to be processed is obtained Sign.
It is described described wait locate according to the index position and frequency of occurrence generation in an optional implementation Manage the phonetic Hash vector of Chinese character, comprising:
Generate the full null vector with dimensions such as the predicted elemental total collections;
For index position of each element in the predicted elemental total collection, determine the index position described The corresponding numerical value of the dimension is updated to out occurrence of the element in the Chinese character to be processed by the dimension in full null vector Number, obtains the phonetic Hash vector of the Chinese character to be processed.
In an optional implementation, the method also includes:
The all elements of each of pre-set dictionary Chinese character are obtained, the element includes the sound of the Chinese phonetic alphabet of Chinese character It adjusts, the stroke for the simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that the Chinese phonetic alphabet includes include;
The all elements of each Chinese character are sought into union, obtain the predicted elemental total collection, wherein the predicted elemental Each of total collection element all has fixed index position.
In an optional implementation, described at least one element for obtaining Chinese character to be processed, comprising: according to preparatory The corresponding relationship between element that the Chinese character and Chinese character of setting include, determining element corresponding with the Chinese character to be processed, and The element for including as Chinese character to be processed.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 604 of instruction, above-metioned instruction can be executed by the processor 620 of terminal 600 to complete above-mentioned Chines words processing side Method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, magnetic Band, floppy disk and optical data storage devices etc..When the instruction in storage medium is executed by the processor of terminal, enable the terminal to The step of executing any one heretofore described Chinese character processing method.
Provided herein Chines words processing scheme not with any certain computer, virtual system or the intrinsic phase of other equipment It closes.Various general-purpose systems can also be used together with teachings based herein.As described above, construction has present invention side Structure required by the system of case is obvious.In addition, the present invention is also not directed to any particular programming language.It should be bright It is white, it can use various programming languages and realize summary of the invention described herein, and retouched above to what language-specific was done State is in order to disclose the best mode of carrying out the invention.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, such as right As claim reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool Thus claims of body embodiment are expressly incorporated in the specific embodiment, wherein each claim conduct itself Separate embodiments of the invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any Can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) come realize some in Chines words processing scheme according to an embodiment of the present invention or The some or all functions of person's whole component.The present invention is also implemented as one for executing method as described herein Point or whole device or device programs (for example, computer program and computer program product).Such this hair of realization Bright program can store on a computer-readable medium, or may be in the form of one or more signals.It is such Signal can be downloaded from an internet website to obtain, and is perhaps provided on the carrier signal or is provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word " comprising " does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims (8)

1. a kind of Chinese character processing method, which is characterized in that the described method includes:
Obtain at least one element that Chinese character to be processed includes, the element include the Chinese phonetic alphabet of Chinese character to be processed tone, The stroke of Chinese character to be processed described in the simple or compound vowel of a Chinese syllable and composition that initial consonant that the Chinese phonetic alphabet includes, the Chinese phonetic alphabet include;
Determine each index position of the element in predicted elemental total collection;
Count frequency of occurrence of each described element in the Chinese character to be processed;
The phonetic Hash vector of the Chinese character to be processed is generated according to the index position and the frequency of occurrence;
Using phonetic Hash vector described in default insertion Processing with Neural Network, the continuous feature of the Chinese character to be processed is obtained;
Wherein, the phonetic Hash vector that the Chinese character to be processed is generated according to the index position and the frequency of occurrence, Include:
Generate the full null vector with dimensions such as the predicted elemental total collections;
For index position of each element in the predicted elemental total collection, determine the index position described complete zero The corresponding numerical value of the dimension is updated to frequency of occurrence of the element in the Chinese character to be processed by the dimension in vector, Obtain the phonetic Hash vector of the Chinese character to be processed.
2. the method according to claim 1, wherein the method also includes:
The all elements of each of pre-set dictionary Chinese character are obtained, the element includes the tone of the Chinese phonetic alphabet of Chinese character, the Chinese The stroke of initial consonant, the simple or compound vowel of a Chinese syllable that the Chinese phonetic alphabet includes and composition Chinese character that language phonetic includes;
The all elements of each Chinese character are sought into union, obtain the predicted elemental total collection, wherein the predicted elemental always collects Each of conjunction element all has fixed index position.
3. the method according to claim 1, wherein described at least one element for obtaining Chinese character to be processed, packet It includes:
The corresponding relationship between element for including according to pre-set Chinese character and Chinese character, determination are opposite with the Chinese character to be processed The element answered, and the element for including as Chinese character to be processed.
4. a kind of Chinese character processing device, which is characterized in that described device includes:
First obtains module, at least one element for including for obtaining Chinese character to be processed, the element includes Chinese character to be processed The Chinese phonetic alphabet tone, the Chinese phonetic alphabet initial consonant, the Chinese phonetic alphabet that include include simple or compound vowel of a Chinese syllable and composition described in Handle the stroke of Chinese character;
Determining module, for determining each index position of the element in predicted elemental total collection;
Statistical module, for counting frequency of occurrence of each described element in the Chinese character to be processed;
Generation module, for generated according to the index position and the frequency of occurrence phonetic Hash of the Chinese character to be processed to Amount;
Processing module, for obtaining the Chinese character to be processed using phonetic Hash vector described in default insertion Processing with Neural Network Continuous feature;
Wherein, the generation module includes:
Generation unit, for generating and the full null vector of the dimensions such as the predicted elemental total collection;
Determination unit determines the index for the index position for each element in the predicted elemental total collection Dimension of the position in the full null vector, updating unit exist for the corresponding numerical value of the dimension to be updated to the element Frequency of occurrence in the Chinese character to be processed obtains the phonetic Hash vector of the Chinese character to be processed.
5. device according to claim 4, which is characterized in that described device further include:
Second obtains module, and for obtaining all elements of each of pre-set dictionary Chinese character, the element includes Chinese character The stroke of simple or compound vowel of a Chinese syllable and composition Chinese character that initial consonant, the Chinese phonetic alphabet that tone, the Chinese phonetic alphabet of the Chinese phonetic alphabet include include;
Union module is sought, for all elements of each Chinese character to be sought union, obtains the predicted elemental total collection, wherein Each of predicted elemental total collection element all has fixed index position.
6. device according to claim 4, which is characterized in that the first acquisition module is specifically used for: according to setting in advance The corresponding relationship between element that the Chinese character and Chinese character set include, determining element corresponding with the Chinese character to be processed, and make The element for including for Chinese character to be processed.
7. a kind of terminal characterized by comprising memory, processor and be stored on the memory and can be at the place The Chines words processing program run on reason device, realizes such as claims 1 to 3 when the Chines words processing program is executed by the processor Any one of described in Chinese character processing method the step of.
8. a kind of computer readable storage medium, which is characterized in that be stored at Chinese character on the computer readable storage medium Program is managed, the Chines words processing program realizes Chines words processing as claimed any one in claims 1 to 3 when being executed by processor The step of method.
CN201810191423.4A 2018-03-08 2018-03-08 Chinese character processing method and device Active CN108549627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810191423.4A CN108549627B (en) 2018-03-08 2018-03-08 Chinese character processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810191423.4A CN108549627B (en) 2018-03-08 2018-03-08 Chinese character processing method and device

Publications (2)

Publication Number Publication Date
CN108549627A CN108549627A (en) 2018-09-18
CN108549627B true CN108549627B (en) 2019-10-01

Family

ID=63516115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810191423.4A Active CN108549627B (en) 2018-03-08 2018-03-08 Chinese character processing method and device

Country Status (1)

Country Link
CN (1) CN108549627B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111785249A (en) * 2020-07-10 2020-10-16 恒信东方文化股份有限公司 Training method, device and obtaining method of input phoneme of speech synthesis

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609185A (en) * 2017-09-30 2018-01-19 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and computer-readable recording medium for POI Similarity Measure

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678272B (en) * 2012-09-17 2016-04-06 北京信息科技大学 The disposal route of unregistered word in the interdependent treebank of Chinese

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609185A (en) * 2017-09-30 2018-01-19 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and computer-readable recording medium for POI Similarity Measure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种基于N-Gram技术的中文文献自动分类方法;何浩等;《情报学报》;20020831;第21卷(第4期);第2-3节 *
基于N-Gram的文本去重方法研究;王小华等;《杭州电子科技大学学报》;20100430;第30卷(第2期);第61-64页 *
基于汉字固有属性的中文字向量方法研究;胡浩等;《中文信息学报》;20170531;第31卷(第3期);摘要,第3节 *

Also Published As

Publication number Publication date
CN108549627A (en) 2018-09-18

Similar Documents

Publication Publication Date Title
CN108536669B (en) Literal information processing method, device and terminal
CN108399409B (en) Image classification method, device and terminal
CN108256549B (en) Image classification method, device and terminal
CN102681658A (en) Display apparatus controlled by motion and motion control method thereof
CN109871843A (en) Character identifying method and device, the device for character recognition
CN104615663B (en) File ordering method, apparatus and terminal
JP7116088B2 (en) Speech information processing method, device, program and recording medium
CN111242303B (en) Network training method and device, and image processing method and device
CN108038102A (en) Recommendation method, apparatus, terminal and the storage medium of facial expression image
CN110390086A (en) A kind of method, apparatus and storage medium generating text
CN109144285A (en) A kind of input method and device
CN108563683A (en) Label addition method, device and terminal
CN108133217B (en) Characteristics of image determines method, apparatus and terminal
CN108628819A (en) Treating method and apparatus, the device for processing
JP2022510660A (en) Data processing methods and their devices, electronic devices, and storage media
CN110389667A (en) A kind of input method and device
CN109002184A (en) A kind of association method and device of input method candidate word
CN108549627B (en) Chinese character processing method and device
CN108984628A (en) Content description generates the loss value-acquiring method and device of model
CN108628461A (en) A kind of input method and device, a kind of method and apparatus of update dictionary
CN107729439A (en) Obtain the methods, devices and systems of multi-medium data
CN109388249A (en) Input processing method, device, terminal and the readable storage medium storing program for executing of information
CN109145151A (en) A kind of the emotional semantic classification acquisition methods and device of video
CN108073293A (en) A kind of definite method and apparatus of target phrase
KR101984094B1 (en) Mobile terminal and control method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant