CN103473057A - Optimization method of memcpy function - Google Patents

Optimization method of memcpy function Download PDF

Info

Publication number
CN103473057A
CN103473057A CN2013104082595A CN201310408259A CN103473057A CN 103473057 A CN103473057 A CN 103473057A CN 2013104082595 A CN2013104082595 A CN 2013104082595A CN 201310408259 A CN201310408259 A CN 201310408259A CN 103473057 A CN103473057 A CN 103473057A
Authority
CN
China
Prior art keywords
byte
instruction
data
copied
copy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013104082595A
Other languages
Chinese (zh)
Inventor
张福新
陈杰
王锐
吴少刚
张斌
晏华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU LEMOTE TECHNOLOGY Corp Ltd
Original Assignee
JIANGSU LEMOTE TECHNOLOGY Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU LEMOTE TECHNOLOGY Corp Ltd filed Critical JIANGSU LEMOTE TECHNOLOGY Corp Ltd
Priority to CN2013104082595A priority Critical patent/CN103473057A/en
Publication of CN103473057A publication Critical patent/CN103473057A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses an optimization method of a memcpy function. The optimization method of the memcpy function comprises the following steps of 1) copying data to be copied of a len length through a single-byte copy instruction, and enabling at least one of a source address/destination address to meet N-byte alignment, wherein N is the number of bytes which an instruction of the highest bit width in a system can process at a time, namely, after the copy, the source address/destination address meets (x, N) alignment or (N, x) alignment, and x is 1, 2, ...,N; 2) for the (x, N) alignment or the (N, x) alignment, reading data from the source address through x-byte instructions or N-byte instructions to store the data in a register, reading data from the register through the N-byte instructions or the x-byte instruction to store the data into the destination address, and copying the remaining data to be copied through the single-byte copy instruction when the remaining data to be copied is less than N bytes. According to the optimization method of the memcpy function, the instructions of the bit width as high as possible are utilized for copying, and the copy efficiency is improved.

Description

A kind of optimization method of memcpy function
Technical field
The present invention relates to a kind of optimization method that copies function, particularly relate to a kind of optimization method of memcpy function.
Background technology
The Memcpy function is the canonical function of C language, is also through function commonly used, and its effect is arrived another core position by the data Replica in a internal memory.
In existing optimisation technique, by using the instruction of high-bit width in the realization at the memcpy function, for example use the instruction that once copies 8 bytes to substitute the instruction that once copies 4 bytes, thereby raise the efficiency.
Yet the instruction of high-bit width has higher requirement to the alignment of address.Usually the instruction that bit wide is N, it is also N that the alignment of memory address is required.When the source address/destination address of memcpy function does not meet alignment and requires, on different hardware platforms, or can not use the instruction of these high-bit widths or significantly descending appears in the usefulness of instruction.
Summary of the invention
For above-mentioned the deficiencies in the prior art, the purpose of this invention is to provide a kind of optimization method of memcpy function, source address/destination address wide alignment that attains to a high place as far as possible, thus use the high-bit width read write command to complete the copy task, promote memcpy function efficiency.
Technical scheme of the present invention is such: a kind of optimization method of memcpy function, it is characterized in that, and comprise the following steps:
1) copy the data to be copied of instruction copy len length by byte, make that in source address/destination address, at least one meets the N byte-aligned, described N is once treatable byte number of instruction that in system, bit wide is the highest, after the copy, source address/destination address meets (x, N) or (N, x) alignment, x is 1,2 ..., N;
2) for (x, N) alignment, deposit register with the x byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the N byte instruction, copied by one-byte instruction while remaining data deficiencies N byte to be copied; For (N, x) alignment, deposit register with the N byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the x byte instruction, while remaining data deficiencies N byte to be copied, by one-byte instruction, copied.
In a specific embodiment of the present invention, described len is that source address/destination address is except the difference between the larger remainder of N gained and N.
In another specific embodiment of the present invention, before described step 1), judge that whether data length to be copied is greater than the efficiency critical value, when data length to be copied is greater than the efficiency critical value, enters step 1); When data length to be copied is less than or equal to the efficiency critical value, by the instruction of byte copy, complete copy.
Technical scheme provided by the present invention, the poor alignment situation for source address/destination address, by " upgrading " to source address/destination address alignment situation, read and write copy by high as far as possible bit wide instructions, compare under former alignment situation and use the copy of restricted instruction to promote copy efficiency, reduce the copy time delay, strengthened system performance.By judging whether data length to be copied surpasses the efficiency critical value, determine by optimizing the instruction copy or copying instruction by byte, avoid because data length to be copied is too short, cause the loss in efficiency of optimizing instruction itself to make up by the lifting of whole copy procedure efficiency to make and optimize unsuccessfully, promote the applicability of optimizing instruction.
The accompanying drawing explanation
Fig. 1 is flow chart of data processing schematic diagram after memcpy function optimization of the present invention.
Embodiment
Below in conjunction with embodiment, the invention will be further described, but not as a limitation of the invention.
Refer to Fig. 1, at first to the parameter declaration of memcpy function: memcpy (dst, src, size), from memory address src(source address), copy size byte to memory address dst(destination address).Wherein, if src/dst divides exactly 4, being called source address/destination address is 4 byte-aligned, if src/dst divides exactly 8, being called source address/destination address is 8 byte-aligned, the like.
Now suppose that the instruction that in current system, bit wide is the highest once can process N byte, if src/dst is all N byte-aligned, usefulness is the highest.Optimization of the present invention does not meet the situation of N byte-aligned mainly for src/dst, first with the instruction of byte copy, carry out a small amount of byte copy, and after making copy, the alignment situation of src/dst is " upgrading " as much as possible.N=16 for example, in src=15, dst=31 situation, two-address is non-16 byte-aligned, carries out byte copy efficiency lower.Here, we first complete the copy of a byte, make src=16, dst=32, and now 16 byte-aligned both, can be operated by 16 byte bit wide instructions, with this, raises the efficiency.
Particularly, in the present embodiment, the alignment of src/dst " upgrading " at first judges that src/dst removes the greater in N gained remainder, and the difference between larger remainder and N is determined byte copy length thus; Adopt the byte copy command to treat copies data and copied, source address/destination address is changed one by one.The aligned condition of src/dst after " upgrading " must be (x, N) or (N, x); The x value is 1,2,4 ..., N, mean that this address is the x byte-aligned, i.e. " after upgrading ", have at least an address to meet the N alignment in src/dst, another address becomes 1 byte-aligned or 2 byte-aligned or 4 byte-aligned according to concrete condition, the like, after the optimum of appearance is " upgrading ", source address/destination address is the N byte-aligned.Herein x by src/dst except determining in the remainder of N gained is poor, during remainder is poor, the product of all 2 factors is x, remainder is poor is 0 o'clock, x gets N, for example remainder is poor is 1 o'clock, has 02 factor, x=2 0=1, remainder is poor is 10 o'clock, has 12 and 5 factor, x=2 1=2.Src/dst after upgrading, for (x, N) alignment, with the x byte instruction, from the source address reading out data, deposit register in, fetch data and deposit destination address in from register read by the N byte instruction again, copied by one-byte instruction while remaining data deficiencies N byte to be copied; For (N, x) alignment, deposit register with the N byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the x byte instruction, while remaining data deficiencies N byte to be copied, by one-byte instruction, copied.
One is more perfect, and the processing mode after the higher memcpy function optimization of efficiency is such: please in conjunction with Fig. 1, contain three modules in memcpy:
A. alignment situation arbitrarily copies module, and this module can be processed any alignment situation, but efficiency is poor;
B. the computing module of upgrade parameter, the alignment situation of calculating after " upgrading " is (x, N) or (N, x) definite x value; Calculate the length of " upgrading " required copy;
C. fixedly alignment situation copy module, contain " 2*log 2n+1 " individual code block alignment processing (1, N), (2, N), (4, N) ... (N, N) and (N, N/2) ... (N, 2), (N, 1) this " 2*log 2n+1 " kind alignment situation; Use as far as possible the high-bit width instruction in code block.
While starting copy process, at first judge that, in the memcpy parameter, whether the size size is enough, whether data length to be copied is enough long.Because follow-up optimization process need to consume certain computing time, size is too little, and the spent time of optimization process own can't be made up from optimizing the time obtained, and effect of optimization is not obvious even may be slower.So can set a constant herein, this constant is an efficiency critical value, and when size is less than this constant, the copy task is directly completed by the A module, when size is greater than this constant, enters the B resume module, and the B module contents is expressed as with false code:
A=src%N gets source address except the remainder after N
B=dst%N gets destination address except the remainder after N
The data length that len=N-MAX (a, b) " upgrading " process need copy
For 2*log 2n+1 kind alignment situation, can mean with Y position (binary digit) number.
Y=int(log2(2*log2N+1))+1
What after upgrading, situation got is (N, x) form, still (x, N) form, and the alignment situation is encoded into to index
Code1=cmp (a, b) a >=during b, cmp returns to 1, otherwise returns to 0
Code2 means the value of x in (N, x)/(x, N)
Diff=a – b determines that remainder is poor
The number of 2 factors during the calculating remainder is poor.In fact be exactly under binary expression, 0 the number that low level is continuous.Here use ctz function (count trailing zeros) to express this function.
Set (diff, log 2n+1) log is set 2the N+1 position, the special circumstances that processing diff is 0
code2=ctz(diff)
Some CPU directly support the identical functions instruction, and for example the upper bsr instruction of intel (finding from back to front the index that first is set up position)
code2=bsr(diff)
Some CPU support approximate instruction, need to change.The for example clz instruction on MIPS (Count Leading Zeros) is undertaken and operation by diff and its opposite number (complement code of computing machine is expressed) step-by-step, thereby obtains the number of shape as " 000010000 ".Then use register overall length word_len, deduct high-order continuous 0 number and middle 1, obtain 0 continuous number of low level
codec2=word_len-clz(diff&-diff)–1
Under above-mentioned two kinds of CPU, index=(code1<<(Y-1)) | code2.
Enter the A module after the B module completes, the data copy that the len obtained according to the B module carries out respective length is alignd " upgrading ", then enters the C module.
Having a length in the C module is 2 ythe oral thermometer that enters (contain 2*log 2n+1 effective), in table, each effective item has been stored the code block start address of processing fixedly alignment situation in the C module.For (x, N) alignment, deposit register with the x byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the N byte instruction; For (N, x) alignment, deposit register with the N byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the x byte instruction.The index obtained by the B module comes index C module to enter oral thermometer, just can obtain in the C module needing the code block start address of using, and is copied.
Then judge whether data to be copied all finish dealing with, as complete whole function end, as do not completed, enter the A module, complete the remaining function that can not be copied to by the data of high-bit width instruction process and finish.

Claims (3)

1. the optimization method of a memcpy function, is characterized in that, comprises the following steps:
1) copy the data to be copied of instruction copy len length by byte, make that in source address/destination address, at least one meets the N byte-aligned, described N is once treatable byte number of instruction that in system, bit wide is the highest, after the copy, source address/destination address meets (x, N) or (N, x) alignment, x is 1,2 ..., N;
2) for (x, N) alignment, deposit register with the x byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the N byte instruction, copied by one-byte instruction while remaining data deficiencies N byte to be copied; For (N, x) alignment, deposit register with the N byte instruction in from the source address reading out data, then fetch data and deposit destination address in from register read by the x byte instruction, while remaining data deficiencies N byte to be copied, by one-byte instruction, copied.
2. the optimization method of memcpy function according to claim 1 is characterized in that: described len is that source address/destination address is except the difference between the larger remainder of N gained and N.
3. the optimization method of memcpy function according to claim 1, is characterized in that: before described step 1), judge that whether data length to be copied is greater than the efficiency critical value, when data length to be copied is greater than the efficiency critical value, enters step 1); When data length to be copied is less than or equal to the efficiency critical value, by the instruction of byte copy, complete copy.
CN2013104082595A 2013-09-10 2013-09-10 Optimization method of memcpy function Pending CN103473057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013104082595A CN103473057A (en) 2013-09-10 2013-09-10 Optimization method of memcpy function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013104082595A CN103473057A (en) 2013-09-10 2013-09-10 Optimization method of memcpy function

Publications (1)

Publication Number Publication Date
CN103473057A true CN103473057A (en) 2013-12-25

Family

ID=49797929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013104082595A Pending CN103473057A (en) 2013-09-10 2013-09-10 Optimization method of memcpy function

Country Status (1)

Country Link
CN (1) CN103473057A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10540306B2 (en) 2015-06-30 2020-01-21 Huawei Technologies Co., Ltd. Data copying method, direct memory access controller, and computer system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101808186A (en) * 2010-03-26 2010-08-18 四川长虹电器股份有限公司 Method for accelerating memory copying speed
US7805760B2 (en) * 2002-08-05 2010-09-28 Secure Ware Inc. Data processing method, data processing device computer program and recording medium
CN101996390A (en) * 2010-10-20 2011-03-30 中兴通讯股份有限公司 Image copying method and device
CN102662678A (en) * 2012-04-17 2012-09-12 中标软件有限公司 Data processing device and memory data processing method thereof
CN102902548A (en) * 2012-10-24 2013-01-30 中国科学院声学研究所 Method and device for generating assembly level memory duplicate standard library function

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7805760B2 (en) * 2002-08-05 2010-09-28 Secure Ware Inc. Data processing method, data processing device computer program and recording medium
CN101808186A (en) * 2010-03-26 2010-08-18 四川长虹电器股份有限公司 Method for accelerating memory copying speed
CN101996390A (en) * 2010-10-20 2011-03-30 中兴通讯股份有限公司 Image copying method and device
CN102662678A (en) * 2012-04-17 2012-09-12 中标软件有限公司 Data processing device and memory data processing method thereof
CN102902548A (en) * 2012-10-24 2013-01-30 中国科学院声学研究所 Method and device for generating assembly level memory duplicate standard library function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汪睿: "KD60平台MPI通信库优化设计", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10540306B2 (en) 2015-06-30 2020-01-21 Huawei Technologies Co., Ltd. Data copying method, direct memory access controller, and computer system

Similar Documents

Publication Publication Date Title
US9298084B2 (en) Preventing double patterning odd cycles
CN100501744C (en) Document synchronization method and system
CN105335181A (en) OTA upgrade realization method and terminal
CN105446806B (en) A kind of processing method and processing device of the application program without response
CN105659274A (en) Method and device for decoding data streams in reconfigurable platforms
CN103517141A (en) Sectional type data upgrading method based on IP set top box
CN103428184A (en) Method and system for converting communication messages
CN103761060B (en) Data processing method and server
CN103473057A (en) Optimization method of memcpy function
CN105528183A (en) Data storage method and storage equipment
CN102567254B (en) The method that adopts dma controller to carry out data normalization processing
CN106293620B (en) The method of parameter in intel detection of platform Flash Rom
CN104503868B (en) Method of data synchronization, device and system
US20160085683A1 (en) Data receiving device and data receiving method
US9542523B2 (en) Method and apparatus for selecting data path elements for cloning
CN103942082A (en) Complier optimization method for eliminating redundant storage access operations
CN109032981A (en) A kind of method and system counting PCIE information
CN107329947A (en) Determination method, device and the equipment of Similar Text
CN102902707B (en) The method of different editions binary data compatibility and versions of data converting system
CN108009055B (en) Method and device for repairing hold time violation
CN106325769A (en) Data storage method and device
CN110085284B (en) SSD (solid State disk) -oriented gene comparison method and system
CN107423038B (en) Differential inclusion merging method and system independent of file system
CN110136209B (en) Camera calibration method and device and computer readable storage medium
WO2021114025A1 (en) Incremental data determination method, incremental data determination apparatus, server and terminal device

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20131225

C02 Deemed withdrawal of patent application after publication (patent law 2001)