CN105653506A - Method and device for processing texts in GPU on basis of character encoding conversion - Google Patents

Method and device for processing texts in GPU on basis of character encoding conversion Download PDF

Info

Publication number
CN105653506A
CN105653506A CN201511020414.1A CN201511020414A CN105653506A CN 105653506 A CN105653506 A CN 105653506A CN 201511020414 A CN201511020414 A CN 201511020414A CN 105653506 A CN105653506 A CN 105653506A
Authority
CN
China
Prior art keywords
character
floating point
point number
number type
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201511020414.1A
Other languages
Chinese (zh)
Other versions
CN105653506B (en
Inventor
潘昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201511020414.1A priority Critical patent/CN105653506B/en
Publication of CN105653506A publication Critical patent/CN105653506A/en
Application granted granted Critical
Publication of CN105653506B publication Critical patent/CN105653506B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Abstract

Embodiments of the invention provide a method and device for processing texts in a GPU on the basis of character encoding conversion. The method comprises the following steps: obtaining a binary encoding form of each character in an input text; judging whether the binary encoding form is consistent with a preset encoding form; if the binary encoding form is not consistent with the preset encoding form, carrying out binary encoding on the character by adopting the preset encoding form and converting the encoded character into a binary floating point number type; and if the binary encoding form is consistent with the preset encoding form, converting the character into the binary floating point number type and submitting the character of the binary floating point number type to the GPU to calculate and process. Through the embodiments of the invention, the characters in the texts can be converted into the binary floating point number type which can be processed by the GPU, so that the floating point number processing ability of the GPU is effectively utilized.

Description

The method of text-processing in a kind of GPU based on character coding conversion and device
Technical field
The present invention relates to GPU technical field of data processing, particularly relate to method and the device of text-processing in a kind of GPU based on character coding conversion.
Background technology
The appearance of graphic process unit (GraphicsProcessingUnit, GPU) in recent years, serves inestimable pushing effect to the development of high-performance computing field. The powerful computing performance of GPU, makes it obtain successful case more more than traditional solution, at present, in the field of what GPU data processing had been got more and more apply to big data computing and machine learning.
In data processing, GPU has clear superiority relative to central processing unit (CentralProcessingUnit, CPU) on hardware framework, particularly the processing power of floating number. At present, GPU is also mainly employed for the fields such as graphicprocessing, video code conversion or voice analysis. CPU is mainly adopted, but due to CPU architecture, CPU is poor to the processing speed of text, affects the efficiency of text-processing when being processed by text.
Summary of the invention
The object of the embodiment of the present invention is to provide method and the device of text-processing in a kind of GPU based on character coding conversion, to realize the floating number processing power using GPU, it is to increase the speed of text-processing, reaches significantly improving of text analyzing processing speed.
For achieving the above object, the embodiment of the invention discloses the method for text-processing in a kind of GPU based on character coding conversion, described method comprises:
Obtain the binary coding form of each character in input text;
Judge that whether described binary coding form is consistent with default coding form;
If inconsistent, adopt described default coding form that described character is carried out binary coding; By coding after character conversion be binary floating point number type; If consistent, then it would be binary floating point number type by described character conversion;
GPU is submitted in the character being converted to described binary floating point number type and carries out computing.
Preferably, described default coding form comprises the one in following coding form:
Unicode coding form, GB2312 coding form, GBK coding form or GB18030 coding form.
Preferably, described judge that whether described binary coding form consistent with default coding form before, described method also comprises:
Obtain the binary floating point number type that GPU supports;
Described is binary floating point number type by the character conversion after coding, comprising:
By the binary floating point number type that the character conversion after described coding is described GPU support;
Described is binary floating point number type by described character conversion, comprising:
By the binary floating point number type that described character conversion is described GPU support.
Preferably, the binary floating point number type that described acquisition GPU supports comprises:
The API provided by GPU Computational frame, obtains the binary floating point number type that GPU supports.
Preferably, described is that binary floating point number type comprises by the character conversion after coding:
By the binary floating point number type that the character conversion after coding is default;
Described is that binary floating point number type comprises by described character conversion:
By the binary floating point number type that described character conversion is default.
Preferably, the described GPU that submitted to by the character being converted to described binary floating point number type carries out computing and comprises:
Judge whether the length of the binary floating point number type that GPU supports is not less than the length of the binary floating point number type after described character conversion;
If it does, then the character being converted to described binary floating point number type is submitted to GPU, the character of the binary floating point number type after this conversion is directly processed by GPU;
Otherwise, the described character being converted to described binary floating point number type is split, the character of described binary floating point number type is split as the length that described GPU can hold, and the character of the binary floating point number type after fractionation is sent to described GPU process.
Preferably, the binary floating point number type that described GPU supports comprises: half precision binary floating point number type, single precision binary floating point number type and double precision binary floating number type.
Present invention also offers the device of text-processing in a kind of GPU based on character coding conversion, described device comprises:
Character coding obtaining unit, for obtaining the binary coding form of character in each text of input;
Coding judging unit, for judging that whether described binary coding form is consistent with default coding form, if described binary coding form is inconsistent with the coding form preset, then trigger coding arrangement unit, if described binary coding form is consistent with the coding form preset, then trigger transcoder unit;
Described coding arrangement unit, for adopting described default coding form that described character is carried out binary coding;
Described transcoder unit, for by coding after character or described character conversion be binary floating point number type;
Processing unit submitted in character, carries out computing for the character being converted to described binary floating point number type is submitted to GPU.
Preferably, described device also comprises: GPU obtaining unit, for obtaining the binary floating point number type that GPU supports;
Described transcoder unit, specifically for by the character conversion after described coding being the binary floating point number type of described GPU support; Or, it is the binary floating point number type that described GPU supports by described character conversion.
Preferably, described transcoder unit, specifically for being default binary floating point number type by the character after coding or described character conversion.
Preferably, described character submits to processing unit to comprise: GPU parameter judgment sub-unit, character split subelement and character process subelement;
Described GPU parameter judgment sub-unit, for judging the length of the binary floating point number type that GPU supports and whether be not less than the length of the binary floating point number type after described character conversion, if not, then trigger described character and split subelement, if it does, then trigger described character process subelement;
Described character splits subelement, for the described character being converted to described binary floating point number type is split, the character of described binary floating point number type is split as the length that described GPU can hold, and the character of the binary floating point number type after fractionation is sent to described GPU process;
Described character process subelement, for the character being converted to described binary floating point number type is submitted to GPU, the character of the binary floating point number type after this conversion is directly processed by GPU.
The method of text-processing in a kind of GPU based on character coding conversion that the embodiment of the present invention provides and device, by providing a kind of character coding conversion method, character in text can be converted into the binary floating point number type that GPU can process, effectively utilize the floating number processing power of GPU. Certainly, arbitrary product or the method for implementing the present invention must not necessarily need to reach above-described all advantages simultaneously.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, it is briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
The method flow schematic diagram of text-processing in a kind of GPU based on character coding conversion that Fig. 1 provides for the embodiment of the present invention;
The method flow schematic diagram of text-processing in the GPU that another kind that Fig. 2 provides for the embodiment of the present invention is changed based on character coding;
The apparatus structure schematic diagram of text-processing in a kind of GPU based on character coding conversion that Fig. 3 provides for the embodiment of the present invention;
The apparatus structure schematic diagram of text-processing in the GPU that another kind that Fig. 4 provides for the embodiment of the present invention is changed based on character coding.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only the present invention's part embodiment, instead of whole embodiments. Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments all obtained under creative work prerequisite, belong to the scope of protection of the invention.
The method of text-processing in a kind of GPU based on character coding conversion that the embodiment of the present invention provides and device, by providing a kind of character coding method, character in text is converted into the binary floating point number type that GPU can process, and then effectively utilizes the floating number processing power of GPU.
The method flow schematic diagram of text-processing in a kind of GPU based on character coding conversion that Fig. 1 provides for the embodiment of the present invention, comprises the steps:
The binary coding form of S101, each character obtained in input text.
The binary coding form of each character described can be ASCII coding form, Unicode coding form or other coding forms, the present embodiment does not limit the binary coding form of each character in text, as long as the binary coding used in text.
Character coding mode in text has multiple, and also form several comparatively standardization and unified coded system, it is convenient to follow-up some process to text, such as, the forms such as Unicode coding form, GB2312 coding form, GBK coding form and GB18030 coding form.Character in text is processed the coding form needing first to confirm character in text, the mode obtaining the coding form of character in text has a variety of, it is generally, the binary coding form of each character in text is obtained by known condition, such as, the plug-in unit of conventional identification character encoding forms, determines Unicode coding form or other coding forms by this kind of plug-in unit. The binary coding mode obtaining character belongs to prior art, and the present embodiment does not limit the binary coding form how obtaining each character how obtained in input text, as long as the binary coding form of each character in input text can be obtained.
S102, judge that whether described binary coding form is consistent with default coding form, if inconsistent, carry out step S103, if unanimously, directly skip step S103 and carry out step S104.
Described default coding form is a kind of unified coding form, and described default coding form comprises the one in following coding form: Unicode coding form, GB2312 coding form, GBK coding form and GB18030 coding form.
When judging that whether described binary coding form is consistent with default coding form, it is necessary to determine the information such as the coding structure of the binary coding coding form of character, code length in text. By information such as the binary coding form of character in comparative analysis text and the coding structure of coding form preset, code lengths, judge that whether both coding forms are consistent, that is, in text, whether the coding form of character is consistent with default coding form.
The embodiment of the present invention is converted to a kind of unified coding form by unified for the various coding form of the character in literary composition, i.e. described pre-arranged code form, based on described default coding form, simplify follow-up code conversion strategy, that is, avoid needing the switching process of multiple correspondence because of the character in the text of process different coding form.
S103, adopt described default coding form that described character is carried out binary coding.
When the binary coding form of described character is inconsistent with the coding form preset, just need to adopt described default coding form that described character is carried out binary coding, such as, the coding form preset is Unicode coding form, and the character in text is GBK coding form, just need by character coding rule, it is the character of corresponding Unicode coding form by the character conversion of GBK coding form in text.
S104, by coding after character or described character conversion be binary floating point number type.
When the binary coding form of described character is inconsistent with the coding form preset, it is necessary to adopt described default coding form that described character is carried out binary coding, be binary floating point number type by the character conversion after coding. When the binary coding form of described character is consistent with the coding form preset, it is binary floating point number type by described character conversion.
Described binary floating point number type comprises: half precision binary floating point number type, single precision binary floating point number type and double precision binary float.
It is binary floating point number by character conversion, it is possible to based on completing conversion in CPU, it is also possible to based on completing conversion in GPU. In switching process, it is necessary to be a certain binary floating point number according to a certain fixed conversion rule by the character conversion after coding, can change according to the attribute of character itself when specifically changing.Such as, if the character in text can be converted to Unicode coding form, then can be preferentially single precision binary floating point number by this character conversion; If the character in text comprises the Unicode coding form of auxiliary plane, can be then single precision binary floating point number by this character conversion, wherein, Unicode coding form divides 17 groups of layouts, and each group is called a plane, and each plane has 65536 code points, substantially multi-lingual plane is called the 0th plane, being in Unicode a coding section, coding is from U+000 to U+FFF, and other planes outside the 0th plane are called auxiliary plane. When changing, it is also possible to according to electronics for storing the size of the internal memory of the character after this conversion, or the bandwidth when character after this conversion is sent to GPU, it is determined that corresponding binary floating point number type. Such as, if being subject to the restriction of above-mentioned internal memory or bandwidth, it is also possible to be half precision binary floating point number by this character conversion.
Above-mentioned switching process, it is possible to complete in CPU, it is also possible to complete in GPU, if conversion completes in CPU, then needs the binary floating point number after by conversion to be sent to GPU storage inside, if conversion completes in GPU, is just directly stored in GPU inside.
Below with a concrete conversion example, above-mentioned switching process is described.
According to international standard IEEE754, the binary floating point number V of any one character can be expressed as form below:
V=(-1)S*M*2E
(-1) ^s represents sign position, works as s=0, and V is positive number; Working as s=1, V is negative, and M represents significant figure, is more than or equal to 1, is less than 2,2^E and represents exponent bits, and E is index.
According to the regulation of IEEE754, for the single precision floating datum of 32,1 the highest is-symbol position S, then 8 is index E, and remaining 23 are significant figure M. The half accuracy floating-point number for 16 accounts between 2 byte of null, and 1 the highest is-symbol position S, then 5 is index E, and remaining 10 are significant figure M.
If the character in text being finally converted to single precision binary floating point number, when default coding form is the UCS-2 in Unicode coding form, by the character of UCS-2 coding form, when being converted to single precision binary floating point number, index E is set to fixed value, then four binary numerals in the UCS-2 of this character being encoded are as a high position of the mantissa M of single precision binary floating point number or low position, and mantissa M remains part and mends 0.
Such as, Chinese character " in " adopt the UCS-2 in Unicode coding form to encode after, the Unicode code of the employing hexadecimal representation obtained is " u4e2d ", and its binary coding is 0100111000101101; Index E is fixed as 1, adds that single precision floating datum exponent mediant 127 is 128, and converting single precision binary floating point number to is 01000000001001110001011010000000.
The conversion mode of the present embodiment is not limited to which, and the conversion mode that the character after described coding or described character are converted into specific binary floating point number can be met system demand by a certain fixing rule of any employing.
S105, GPU is submitted in the character being converted to described default binary floating point number type carry out computing.
The application embodiment of the present invention, it is possible to the character in text is converted into the binary floating point number type that GPU can process, effectively utilizes the floating number processing power of GPU.
In the GPU that another kind that Fig. 2 provides for the embodiment of the present invention is changed based on character coding, the method flow schematic diagram of text-processing, comprises the steps:
The binary coding form of S201, each character obtained in input text.
S202, the binary floating point number type obtaining GPU support.
The binary floating point number type that described GPU supports comprises: half precision binary floating point number type, single precision binary floating point number type and double precision binary floating number type.
The API provided by GPU Computational frame, it is determined that GPU can hold the space encoder of floating number size, and then obtain the binary floating point number type of GPU support.
S203, judge that whether described binary coding form is consistent with default coding form, if inconsistent, carry out step S204, if unanimously, directly skip step S204 and carry out step S205.
S204, adopt described default coding form that described character is carried out binary coding.
S205, it is binary floating point number type by the character after described coding or described character conversion.
S206, GPU is submitted in the character being converted to described binary floating point number type carry out computing.
The application embodiment of the present invention, character in text can be converted into the binary floating point number type that GPU can directly support, GPU is when processing the floating number of the binary floating point number type mated mutually, it is possible to the floating number processing power utilizing GPU self rapidly and efficiently.
On the basis of Fig. 1, in another kind of enforcement mode that the embodiment of the present invention provides, the step S105 of the method also comprises:
Judge whether the length of the binary floating point number type that GPU supports is not less than the length of the binary floating point number type after described character conversion.
If it does, then the character being converted to described binary floating point number type is submitted to GPU, the character of the binary floating point number type after this conversion is directly processed by GPU.
Otherwise, the described character being converted to described binary floating point number type is split, the character of described binary floating point number type is split as the length that described GPU can hold, and the character of the binary floating point number type after fractionation is sent to described GPU process.
The described character being converted to described binary floating point number type is split, the character of described binary floating point number type is split as the length that described GPU can hold. such as, GPU only supports half precision binary floating point number, Chinese character " in " it is 01000000001001110001011010000000 with single precision binary floating point number, the length of binary floating point number type that GPU supports be less than Chinese character " in " single precision binary floating point number, also be exactly the length of binary floating point number type supported of GPU cannot hold Chinese character " in " single precision binary floating point number, now need to Chinese character " in " single precision binary floating point number split, it is split as two and half precision binary floating point numbers, ensure the length of the binary floating point number type that GPU supports can hold Chinese character " in " single precision binary floating point number, Chinese character after fractionation " in " represent for 0100000100111000+0100000010110100 with half precision binary floating point number.
The application embodiment of the present invention, the binary floating point number type length supported for GPU with transform after the not treatable situation of binary floating point number type, after the character of binary floating point number type is split, effectively utilize the floating number processing power of GPU, improve the ability that GPU processes float simultaneously.
Character in text can be converted into the binary floating point number type that GPU can directly support, GPU is when processing the floating number of the binary floating point number type mated mutually, it is possible to the floating number processing power utilizing GPU self rapidly and efficiently.
The apparatus structure schematic diagram of text-processing in a kind of GPU based on character coding conversion that Fig. 3 provides for the embodiment of the present invention, corresponding with the flow process shown in Fig. 1, comprising: character coding obtaining unit 301, coding judging unit 302, coding arrangement unit 303, transcoder unit 304 and submission processing unit 305:
Character coding obtaining unit 301, for obtaining the binary coding form of character in each text of input;
Coding judging unit 302, for judging that whether described binary coding form is consistent with default coding form, if described binary coding form is inconsistent with the coding form preset, then trigger coding arrangement unit, if described binary coding form is consistent with the coding form preset, then trigger transcoder unit;
Described coding arrangement unit 303, for adopting described default coding form that described character is carried out binary coding;
Described transcoder unit 304, for by coding after character or described character conversion be binary floating point number type;
Processing unit 305 submitted in character, carries out computing for the character being converted to described binary floating point number type is submitted to GPU.
The application embodiment of the present invention, it is possible to the character in text is converted into the binary floating point number type that GPU can process, effectively utilizes the floating number processing power of GPU.
Described transcoder unit 304, specifically for being default binary floating point number type by the character after coding or described character conversion.
Described character submits to processing unit 305 to comprise: GPU parameter judgment sub-unit (not shown), character split subelement (not shown) and character process subelement (not shown).
Described GPU parameter judgment sub-unit, for judging the length of the binary floating point number type that GPU supports and whether be not less than the length of the binary floating point number type after described character conversion, if not, then trigger described character and split subelement, if it does, then trigger described character process subelement;
Described character splits subelement, for the described character being converted to described binary floating point number type is split, the character of described binary floating point number type is split as the length that described GPU can hold, and the character of the binary floating point number type after fractionation is sent to described GPU process;
Described character process subelement, for the character being converted to described binary floating point number type is submitted to GPU, the character of the binary floating point number type after this conversion is directly processed by GPU.
The application embodiment of the present invention, the binary floating point number type length supported for GPU with transform after the not treatable situation of binary floating point number type, the character of binary floating point number type is being split, while effectively utilize the floating number processing power of GPU, it is to increase GPU processes the range of float.
The apparatus structure schematic diagram of text-processing in a kind of GPU based on character coding conversion that Fig. 4 provides for the embodiment of the present invention, corresponding with the flow process shown in Fig. 2, comprising: character coding obtaining unit 401, GPU obtaining unit 402, coding judging unit 403, coding arrangement unit 404, transcoder unit 405 and submission processing unit 406:
Character coding obtaining unit 401, for obtaining the binary coding form of character in each text of input;
GPU obtaining unit 402, for obtaining the binary floating point number type that GPU supports;
Coding judging unit 403, for judging that whether described binary coding form is consistent with default coding form, if described binary coding form is inconsistent with the coding form preset, then trigger coding arrangement unit, if described binary coding form is consistent with the coding form preset, then directly trigger transcoder unit;
Described coding arrangement unit 404, for adopting described default coding form that described character is carried out binary coding;
Described transcoder unit 405, specifically for by the character conversion after described coding being the binary floating point number type of described GPU support; Described is the binary floating point number type that described GPU supports by described character conversion;
Processing unit 406 submitted in character, carries out computing for the character being converted to described default binary floating point number type is submitted to GPU.
The application embodiment of the present invention, character in text can be converted into the binary floating point number type that GPU can directly support, GPU is when processing the floating number of the binary floating point number type mated mutually, it is possible to the floating number processing power utilizing GPU self rapidly and efficiently.
It should be noted that, herein, the such as relational terms of first and second grades and so on is only used for separating an entity or operation with another entity or operational zone, and not necessarily requires or imply to there is any this kind of actual relation or sequentially between these entities or operation. And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, so that comprise the process of a series of key element, method, article or equipment not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise the key element intrinsic for this kind of process, method, article or equipment. When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
Each embodiment in this specification sheets all adopts relevant mode to describe, and what between each embodiment, identical similar part illustrated see, each embodiment emphasis mutually is the difference with other embodiments. Especially, for system embodiment, owing to it is substantially similar to embodiment of the method, so what describe is fairly simple, relevant part illustrates see the part of embodiment of the method.
The foregoing is only the better embodiment of the present invention, it is not intended to limit protection scope of the present invention. All do within the spirit and principles in the present invention any amendment, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (11)

1., based on a method for text-processing in the GPU of character coding conversion, it is applied to electronics, it is characterised in that, described method comprises step:
Obtain the binary coding form of each character in input text;
Judge that whether described binary coding form is consistent with default coding form;
If inconsistent, adopt described default coding form that described character is carried out binary coding; By coding after character conversion be binary floating point number type; If consistent, then it would be binary floating point number type by described character conversion;
GPU is submitted in the character being converted to described binary floating point number type and carries out computing.
2. method according to claim 1, it is characterised in that, described default coding form comprises the one in following coding form:
Unicode coding form, GB2312 coding form, GBK coding form or GB18030 coding form.
3. method according to claim 1, it is characterised in that, described judge that whether described binary coding form consistent with default coding form before, described method also comprises:
Obtain the binary floating point number type that GPU supports;
Described is binary floating point number type by the character conversion after coding, comprising:
By the binary floating point number type that the character conversion after described coding is described GPU support;
Described is binary floating point number type by described character conversion, comprising:
By the binary floating point number type that described character conversion is described GPU support.
4. method according to claim 3, it is characterised in that, the binary floating point number type that described acquisition GPU supports comprises:
The API provided by GPU Computational frame, obtains the binary floating point number type that GPU supports.
5. method according to claim 1, it is characterised in that, described is that binary floating point number type comprises by the character conversion after coding:
By the binary floating point number type that the character conversion after coding is default;
Described is that binary floating point number type comprises by described character conversion:
By the binary floating point number type that described character conversion is default.
6. method according to claim 5, it is characterised in that, the described GPU that submitted to by the character being converted to described binary floating point number type carries out computing and comprises:
Judge whether the length of the binary floating point number type that GPU supports is not less than the length of the binary floating point number type after described character conversion;
If it does, then the character being converted to described binary floating point number type is submitted to GPU, the character of the binary floating point number type after this conversion is directly processed by GPU;
Otherwise, the described character being converted to described binary floating point number type is split, the character of described binary floating point number type is split as the length that described GPU can hold, and the character of the binary floating point number type after fractionation is sent to described GPU process.
7. method according to the arbitrary item of claim 1-6, it is characterised in that, the binary floating point number type that described GPU supports comprises: half precision binary floating point number type, single precision binary floating point number type and double precision binary floating number type.
8., based on a device for text-processing in the GPU of character coding conversion, it is applied to electronics, it is characterised in that, comprising:
Character coding obtaining unit, for obtaining the binary coding form of character in each text of input;
Coding judging unit, for judging that whether described binary coding form is consistent with default coding form, if described binary coding form is inconsistent with the coding form preset, then trigger coding arrangement unit, if described binary coding form is consistent with the coding form preset, then trigger transcoder unit;
Described coding arrangement unit, for adopting described default coding form that described character is carried out binary coding;
Described transcoder unit, for by coding after character or described character conversion be binary floating point number type;
Processing unit submitted in character, carries out computing for the character being converted to described binary floating point number type is submitted to GPU.
9. device according to claim 8, it is characterised in that, also comprise:
GPU obtaining unit, for obtaining the binary floating point number type that GPU supports;
Described transcoder unit, specifically for by the character conversion after described coding being the binary floating point number type of described GPU support; Or, it is the binary floating point number type that described GPU supports by described character conversion.
10. device according to claim 8, it is characterised in that, described transcoder unit, specifically for being default binary floating point number type by the character after coding or described character conversion.
11. devices according to claim 10, it is characterised in that, described character submits to processing unit to comprise: GPU parameter judgment sub-unit, character split subelement and character process subelement;
Described GPU parameter judgment sub-unit, for judging the length of the binary floating point number type that GPU supports and whether be not less than the length of the binary floating point number type after described character conversion, if not, then trigger described character and split subelement, if it does, then trigger described character process subelement;
Described character splits subelement, for the described character being converted to described binary floating point number type is split, the character of described binary floating point number type is split as the length that described GPU can hold, and the character of the binary floating point number type after fractionation is sent to described GPU process;
Described character process subelement, for the character being converted to described binary floating point number type is submitted to GPU, the character of the binary floating point number type after this conversion is directly processed by GPU.
CN201511020414.1A 2015-12-30 2015-12-30 It is a kind of based on character code conversion GPU in text-processing method and device Active CN105653506B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511020414.1A CN105653506B (en) 2015-12-30 2015-12-30 It is a kind of based on character code conversion GPU in text-processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511020414.1A CN105653506B (en) 2015-12-30 2015-12-30 It is a kind of based on character code conversion GPU in text-processing method and device

Publications (2)

Publication Number Publication Date
CN105653506A true CN105653506A (en) 2016-06-08
CN105653506B CN105653506B (en) 2019-07-12

Family

ID=56478485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511020414.1A Active CN105653506B (en) 2015-12-30 2015-12-30 It is a kind of based on character code conversion GPU in text-processing method and device

Country Status (1)

Country Link
CN (1) CN105653506B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408028A (en) * 2018-09-21 2019-03-01 东软集团股份有限公司 Floating point arithmetic method, apparatus and storage medium
CN111597802A (en) * 2020-05-14 2020-08-28 支付宝实验室(新加坡)有限公司 Service processing method and device and electronic equipment
CN114445129A (en) * 2022-01-13 2022-05-06 湖北国际物流机场有限公司 BIM coding system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309483A (en) * 2008-05-29 2008-11-19 深圳华为通信技术有限公司 Short message encoding and decoding method and terminal
US20080291209A1 (en) * 2007-05-25 2008-11-27 Nvidia Corporation Encoding Multi-media Signals
CN102436545A (en) * 2011-10-13 2012-05-02 苏州东方楷模医药科技有限公司 Diversity analysis method based on chemical structure with CPU (Central Processing Unit) acceleration
CN102750290A (en) * 2011-04-22 2012-10-24 深圳创维数字技术股份有限公司 Realizing method of STB (Set Top Box) database and STB
CN103617027A (en) * 2013-10-29 2014-03-05 合一网络技术(北京)有限公司 Android-based method and system for constructing image rendering engine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080291209A1 (en) * 2007-05-25 2008-11-27 Nvidia Corporation Encoding Multi-media Signals
CN101309483A (en) * 2008-05-29 2008-11-19 深圳华为通信技术有限公司 Short message encoding and decoding method and terminal
CN102750290A (en) * 2011-04-22 2012-10-24 深圳创维数字技术股份有限公司 Realizing method of STB (Set Top Box) database and STB
CN102436545A (en) * 2011-10-13 2012-05-02 苏州东方楷模医药科技有限公司 Diversity analysis method based on chemical structure with CPU (Central Processing Unit) acceleration
CN103617027A (en) * 2013-10-29 2014-03-05 合一网络技术(北京)有限公司 Android-based method and system for constructing image rendering engine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李雨前: "GPU CUDA 海量文本并行处理【开源代码】", 《HTTP://BLOG.SINA.COM.CN/S/BLOG_4D58E3C001010UFV.HTML》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408028A (en) * 2018-09-21 2019-03-01 东软集团股份有限公司 Floating point arithmetic method, apparatus and storage medium
CN109408028B (en) * 2018-09-21 2021-03-05 东软集团股份有限公司 Floating point number operation method and device and storage medium
CN111597802A (en) * 2020-05-14 2020-08-28 支付宝实验室(新加坡)有限公司 Service processing method and device and electronic equipment
CN111597802B (en) * 2020-05-14 2023-08-22 支付宝实验室(新加坡)有限公司 Service processing method and device and electronic equipment
CN114445129A (en) * 2022-01-13 2022-05-06 湖北国际物流机场有限公司 BIM coding system
CN114445129B (en) * 2022-01-13 2024-03-19 湖北国际物流机场有限公司 BIM coding system

Also Published As

Publication number Publication date
CN105653506B (en) 2019-07-12

Similar Documents

Publication Publication Date Title
US11055287B2 (en) Eigenvalue-based data query
CN105893337B (en) Method and apparatus for text compression and decompression
CN102508824B (en) Compression coding and decoding method and device for microblog information
CN103399907A (en) Method and device for calculating similarity of Chinese character strings on the basis of edit distance
CN102750268A (en) Object serializing method as well as object de-serializing method, device and system
CN110825323B (en) Storage and reading method of floating point number data and computer readable storage medium
WO2023045204A1 (en) Method and system for generating finite state entropy coding table, medium, and device
US8847797B1 (en) Byte-aligned dictionary-based compression and decompression
CN105653506A (en) Method and device for processing texts in GPU on basis of character encoding conversion
US10055454B2 (en) System and a method for executing SQL basic operators on compressed data without decompression process
CN103078646B (en) Dictionary enquiring compression, decompression method and device thereof
EP2442256B1 (en) Method of encoding and decoding text on a matrix code symbol
CN112200286A (en) Method and device for character string coding
CN107534446B (en) ASIC block for high bandwidth LZ77 decompression
CN105550535A (en) Encoding method for rapidly encoding gene character sequence into binary sequence
CN1862525A (en) Code conversion method
CN105931279B (en) A kind of three-dimensional modeling data compression method and system
CN106406560A (en) Method and system for outputting vector fonts of mechanical engineering characters in desktop operation system
CN101079636B (en) Method for decoding one-bit hot code into binary code and one-bit hot code encoder
CN110021349B (en) Method for encoding gene data
CN103186365A (en) Method and system for generating control instruction according to change of font outline
CN115630614A (en) Data transmission method, device, electronic equipment and medium
CN103699729B (en) Modulus multiplier
US10250278B2 (en) Compression of a set of integers
Bossard et al. Refining the Unrestricted Character Encoding for Japanese.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant