CN103473057A - Optimization method of memcpy function - Google Patents
Optimization method of memcpy function Download PDFInfo
- Publication number
- CN103473057A CN103473057A CN2013104082595A CN201310408259A CN103473057A CN 103473057 A CN103473057 A CN 103473057A CN 2013104082595 A CN2013104082595 A CN 2013104082595A CN 201310408259 A CN201310408259 A CN 201310408259A CN 103473057 A CN103473057 A CN 103473057A
- Authority
- CN
- China
- Prior art keywords
- byte
- instruction
- data
- copied
- copy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 18
- 238000000034 method Methods 0.000 abstract description 9
- 230000000694 effects Effects 0.000 description 2
- 230000000295 complement Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
Abstract
The invention discloses an optimization method of a memcpy function. The optimization method of the memcpy function comprises the following steps of 1) copying data to be copied of a len length through a single-byte copy instruction, and enabling at least one of a source address/destination address to meet N-byte alignment, wherein N is the number of bytes which an instruction of the highest bit width in a system can process at a time, namely, after the copy, the source address/destination address meets (x, N) alignment or (N, x) alignment, and x is 1, 2, ...,N; 2) for the (x, N) alignment or the (N, x) alignment, reading data from the source address through x-byte instructions or N-byte instructions to store the data in a register, reading data from the register through the N-byte instructions or the x-byte instruction to store the data into the destination address, and copying the remaining data to be copied through the single-byte copy instruction when the remaining data to be copied is less than N bytes. According to the optimization method of the memcpy function, the instructions of the bit width as high as possible are utilized for copying, and the copy efficiency is improved.
Description
Technical field
The present invention relates to a kind of optimization method that copies function, particularly relate to a kind of optimization method of memcpy function.
Background technology
The Memcpy function is the canonical function of C language, is also through function commonly used, and its effect is arrived another core position by the data Replica in a internal memory.
In existing optimisation technique, by using the instruction of high-bit width in the realization at the memcpy function, for example use the instruction that once copies 8 bytes to substitute the instruction that once copies 4 bytes, thereby raise the efficiency.
Yet the instruction of high-bit width has higher requirement to the alignment of address.Usually the instruction that bit wide is N, it is also N that the alignment of memory address is required.When the source address/destination address of memcpy function does not meet alignment and requires, on different hardware platforms, or can not use the instruction of these high-bit widths or significantly descending appears in the usefulness of instruction.
Summary of the invention
For above-mentioned the deficiencies in the prior art, the purpose of this invention is to provide a kind of optimization method of memcpy function, source address/destination address wide alignment that attains to a high place as far as possible, thus use the high-bit width read write command to complete the copy task, promote memcpy function efficiency.
Technical scheme of the present invention is such: a kind of optimization method of memcpy function, it is characterized in that, and comprise the following steps:
1) copy the data to be copied of instruction copy len length by byte, make that in source address/destination address, at least one meets the N byte-aligned, described N is once treatable byte number of instruction that in system, bit wide is the highest, after the copy, source address/destination address meets (x, N) or (N, x) alignment, x is 1,2 ..., N;
2) for (x, N) alignment, deposit register with the x byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the N byte instruction, copied by one-byte instruction while remaining data deficiencies N byte to be copied; For (N, x) alignment, deposit register with the N byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the x byte instruction, while remaining data deficiencies N byte to be copied, by one-byte instruction, copied.
In a specific embodiment of the present invention, described len is that source address/destination address is except the difference between the larger remainder of N gained and N.
In another specific embodiment of the present invention, before described step 1), judge that whether data length to be copied is greater than the efficiency critical value, when data length to be copied is greater than the efficiency critical value, enters step 1); When data length to be copied is less than or equal to the efficiency critical value, by the instruction of byte copy, complete copy.
Technical scheme provided by the present invention, the poor alignment situation for source address/destination address, by " upgrading " to source address/destination address alignment situation, read and write copy by high as far as possible bit wide instructions, compare under former alignment situation and use the copy of restricted instruction to promote copy efficiency, reduce the copy time delay, strengthened system performance.By judging whether data length to be copied surpasses the efficiency critical value, determine by optimizing the instruction copy or copying instruction by byte, avoid because data length to be copied is too short, cause the loss in efficiency of optimizing instruction itself to make up by the lifting of whole copy procedure efficiency to make and optimize unsuccessfully, promote the applicability of optimizing instruction.
The accompanying drawing explanation
Fig. 1 is flow chart of data processing schematic diagram after memcpy function optimization of the present invention.
Embodiment
Below in conjunction with embodiment, the invention will be further described, but not as a limitation of the invention.
Refer to Fig. 1, at first to the parameter declaration of memcpy function: memcpy (dst, src, size), from memory address src(source address), copy size byte to memory address dst(destination address).Wherein, if src/dst divides exactly 4, being called source address/destination address is 4 byte-aligned, if src/dst divides exactly 8, being called source address/destination address is 8 byte-aligned, the like.
Now suppose that the instruction that in current system, bit wide is the highest once can process N byte, if src/dst is all N byte-aligned, usefulness is the highest.Optimization of the present invention does not meet the situation of N byte-aligned mainly for src/dst, first with the instruction of byte copy, carry out a small amount of byte copy, and after making copy, the alignment situation of src/dst is " upgrading " as much as possible.N=16 for example, in src=15, dst=31 situation, two-address is non-16 byte-aligned, carries out byte copy efficiency lower.Here, we first complete the copy of a byte, make src=16, dst=32, and now 16 byte-aligned both, can be operated by 16 byte bit wide instructions, with this, raises the efficiency.
Particularly, in the present embodiment, the alignment of src/dst " upgrading " at first judges that src/dst removes the greater in N gained remainder, and the difference between larger remainder and N is determined byte copy length thus; Adopt the byte copy command to treat copies data and copied, source address/destination address is changed one by one.The aligned condition of src/dst after " upgrading " must be (x, N) or (N, x); The x value is 1,2,4 ..., N, mean that this address is the x byte-aligned, i.e. " after upgrading ", have at least an address to meet the N alignment in src/dst, another address becomes 1 byte-aligned or 2 byte-aligned or 4 byte-aligned according to concrete condition, the like, after the optimum of appearance is " upgrading ", source address/destination address is the N byte-aligned.Herein x by src/dst except determining in the remainder of N gained is poor, during remainder is poor, the product of all 2 factors is x, remainder is poor is 0 o'clock, x gets N, for example remainder is poor is 1 o'clock, has 02 factor, x=2
0=1, remainder is poor is 10 o'clock, has 12 and 5 factor, x=2
1=2.Src/dst after upgrading, for (x, N) alignment, with the x byte instruction, from the source address reading out data, deposit register in, fetch data and deposit destination address in from register read by the N byte instruction again, copied by one-byte instruction while remaining data deficiencies N byte to be copied; For (N, x) alignment, deposit register with the N byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the x byte instruction, while remaining data deficiencies N byte to be copied, by one-byte instruction, copied.
One is more perfect, and the processing mode after the higher memcpy function optimization of efficiency is such: please in conjunction with Fig. 1, contain three modules in memcpy:
A. alignment situation arbitrarily copies module, and this module can be processed any alignment situation, but efficiency is poor;
B. the computing module of upgrade parameter, the alignment situation of calculating after " upgrading " is (x, N) or (N, x) definite x value; Calculate the length of " upgrading " required copy;
C. fixedly alignment situation copy module, contain " 2*log
2n+1 " individual code block alignment processing (1, N), (2, N), (4, N) ... (N, N) and (N, N/2) ... (N, 2), (N, 1) this " 2*log
2n+1 " kind alignment situation; Use as far as possible the high-bit width instruction in code block.
While starting copy process, at first judge that, in the memcpy parameter, whether the size size is enough, whether data length to be copied is enough long.Because follow-up optimization process need to consume certain computing time, size is too little, and the spent time of optimization process own can't be made up from optimizing the time obtained, and effect of optimization is not obvious even may be slower.So can set a constant herein, this constant is an efficiency critical value, and when size is less than this constant, the copy task is directly completed by the A module, when size is greater than this constant, enters the B resume module, and the B module contents is expressed as with false code:
A=src%N gets source address except the remainder after N
B=dst%N gets destination address except the remainder after N
The data length that len=N-MAX (a, b) " upgrading " process need copy
For 2*log
2n+1 kind alignment situation, can mean with Y position (binary digit) number.
Y=int(log2(2*log2N+1))+1
What after upgrading, situation got is (N, x) form, still (x, N) form, and the alignment situation is encoded into to index
Code1=cmp (a, b) a >=during b, cmp returns to 1, otherwise returns to 0
Code2 means the value of x in (N, x)/(x, N)
Diff=a – b determines that remainder is poor
The number of 2 factors during the calculating remainder is poor.In fact be exactly under binary expression, 0 the number that low level is continuous.Here use ctz function (count trailing zeros) to express this function.
Set (diff, log
2n+1) log is set
2the N+1 position, the special circumstances that processing diff is 0
code2=ctz(diff)
Some CPU directly support the identical functions instruction, and for example the upper bsr instruction of intel (finding from back to front the index that first is set up position)
code2=bsr(diff)
Some CPU support approximate instruction, need to change.The for example clz instruction on MIPS (Count Leading Zeros) is undertaken and operation by diff and its opposite number (complement code of computing machine is expressed) step-by-step, thereby obtains the number of shape as " 000010000 ".Then use register overall length word_len, deduct high-order continuous 0 number and middle 1, obtain 0 continuous number of low level
codec2=word_len-clz(diff&-diff)–1
Under above-mentioned two kinds of CPU, index=(code1<<(Y-1)) | code2.
Enter the A module after the B module completes, the data copy that the len obtained according to the B module carries out respective length is alignd " upgrading ", then enters the C module.
Having a length in the C module is 2
ythe oral thermometer that enters (contain 2*log
2n+1 effective), in table, each effective item has been stored the code block start address of processing fixedly alignment situation in the C module.For (x, N) alignment, deposit register with the x byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the N byte instruction; For (N, x) alignment, deposit register with the N byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the x byte instruction.The index obtained by the B module comes index C module to enter oral thermometer, just can obtain in the C module needing the code block start address of using, and is copied.
Then judge whether data to be copied all finish dealing with, as complete whole function end, as do not completed, enter the A module, complete the remaining function that can not be copied to by the data of high-bit width instruction process and finish.
Claims (3)
1. the optimization method of a memcpy function, is characterized in that, comprises the following steps:
1) copy the data to be copied of instruction copy len length by byte, make that in source address/destination address, at least one meets the N byte-aligned, described N is once treatable byte number of instruction that in system, bit wide is the highest, after the copy, source address/destination address meets (x, N) or (N, x) alignment, x is 1,2 ..., N;
2) for (x, N) alignment, deposit register with the x byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the N byte instruction, copied by one-byte instruction while remaining data deficiencies N byte to be copied; For (N, x) alignment, deposit register with the N byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the x byte instruction, while remaining data deficiencies N byte to be copied, by one-byte instruction, copied.
2. the optimization method of memcpy function according to claim 1 is characterized in that: described len is that source address/destination address is except the difference between the larger remainder of N gained and N.
3. the optimization method of memcpy function according to claim 1, is characterized in that: before described step 1), judge that whether data length to be copied is greater than the efficiency critical value, when data length to be copied is greater than the efficiency critical value, enters step 1); When data length to be copied is less than or equal to the efficiency critical value, by the instruction of byte copy, complete copy.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2013104082595A CN103473057A (en) | 2013-09-10 | 2013-09-10 | Optimization method of memcpy function |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2013104082595A CN103473057A (en) | 2013-09-10 | 2013-09-10 | Optimization method of memcpy function |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN103473057A true CN103473057A (en) | 2013-12-25 |
Family
ID=49797929
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2013104082595A CN103473057A (en) | 2013-09-10 | 2013-09-10 | Optimization method of memcpy function |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN103473057A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10540306B2 (en) | 2015-06-30 | 2020-01-21 | Huawei Technologies Co., Ltd. | Data copying method, direct memory access controller, and computer system |
| CN110990298A (en) * | 2019-12-02 | 2020-04-10 | 龙芯中科(合肥)技术有限公司 | Data copy processing method and device, electronic equipment and storage medium |
| CN110990298B (en) * | 2019-12-02 | 2022-03-08 | 龙芯中科(合肥)技术有限公司 | Data copy processing method and device, electronic equipment and storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101808186A (en) * | 2010-03-26 | 2010-08-18 | 四川长虹电器股份有限公司 | Method for accelerating memory copying speed |
| US7805760B2 (en) * | 2002-08-05 | 2010-09-28 | Secure Ware Inc. | Data processing method, data processing device computer program and recording medium |
| CN101996390A (en) * | 2010-10-20 | 2011-03-30 | 中兴通讯股份有限公司 | Image copying method and device |
| CN102662678A (en) * | 2012-04-17 | 2012-09-12 | 中标软件有限公司 | Data processing device and memory data processing method thereof |
| CN102902548A (en) * | 2012-10-24 | 2013-01-30 | 中国科学院声学研究所 | Method and device for generating assembly level memory duplicate standard library function |
-
2013
- 2013-09-10 CN CN2013104082595A patent/CN103473057A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7805760B2 (en) * | 2002-08-05 | 2010-09-28 | Secure Ware Inc. | Data processing method, data processing device computer program and recording medium |
| CN101808186A (en) * | 2010-03-26 | 2010-08-18 | 四川长虹电器股份有限公司 | Method for accelerating memory copying speed |
| CN101996390A (en) * | 2010-10-20 | 2011-03-30 | 中兴通讯股份有限公司 | Image copying method and device |
| CN102662678A (en) * | 2012-04-17 | 2012-09-12 | 中标软件有限公司 | Data processing device and memory data processing method thereof |
| CN102902548A (en) * | 2012-10-24 | 2013-01-30 | 中国科学院声学研究所 | Method and device for generating assembly level memory duplicate standard library function |
Non-Patent Citations (1)
| Title |
|---|
| 汪睿: "KD60平台MPI通信库优化设计", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10540306B2 (en) | 2015-06-30 | 2020-01-21 | Huawei Technologies Co., Ltd. | Data copying method, direct memory access controller, and computer system |
| CN110990298A (en) * | 2019-12-02 | 2020-04-10 | 龙芯中科(合肥)技术有限公司 | Data copy processing method and device, electronic equipment and storage medium |
| CN110990298B (en) * | 2019-12-02 | 2022-03-08 | 龙芯中科(合肥)技术有限公司 | Data copy processing method and device, electronic equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105446806A (en) | Processing method and apparatus for application not responding | |
| CN101127051A (en) | Document synchronization method and system | |
| CN103428184A (en) | Method and system for converting communication messages | |
| CN103761060B (en) | Data processing method and server | |
| WO2020156797A1 (en) | Handling an input/output store instruction | |
| CN105573711A (en) | Data caching methods and apparatuses | |
| CN103473057A (en) | Optimization method of memcpy function | |
| CN106528893A (en) | Data synchronization method and device | |
| CN105528183A (en) | Data storage method and storage equipment | |
| CN106293620B (en) | The method of parameter in intel detection of platform Flash Rom | |
| CN108062235B (en) | Data processing method and device | |
| US20150199468A1 (en) | Method and apparatus for selecting data path elements for cloning | |
| CN110955515A (en) | File processing method and device, electronic equipment and storage medium | |
| CN110136209B (en) | Camera calibration method and device and computer readable storage medium | |
| CN110109970B (en) | Data query processing method and device | |
| CN106648758A (en) | Multi-core processor BOOT starting system and method | |
| CN108009055B (en) | Method and device for repairing hold time violation | |
| CN103942082A (en) | Complier optimization method for eliminating redundant storage access operations | |
| CN103927153A (en) | System configuration method and device and system | |
| CN107766048A (en) | A kind of pagecompile method and device | |
| WO2016095491A1 (en) | Equipment upgrading method and transport network equipment | |
| CN102902707B (en) | The method of different editions binary data compatibility and versions of data converting system | |
| CN110085284B (en) | SSD (solid State disk) -oriented gene comparison method and system | |
| CN107329947A (en) | Determination method, device and the equipment of Similar Text | |
| CN107423038B (en) | Differential inclusion merging method and system independent of file system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| C06 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| C10 | Entry into substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20131225 |
|
| C02 | Deemed withdrawal of patent application after publication (patent law 2001) |