CN101452465A - Mass file data storing and reading method - Google Patents

Mass file data storing and reading method Download PDF

Info

Publication number
CN101452465A
CN101452465A CNA2007101990028A CN200710199002A CN101452465A CN 101452465 A CN101452465 A CN 101452465A CN A2007101990028 A CNA2007101990028 A CN A2007101990028A CN 200710199002 A CN200710199002 A CN 200710199002A CN 101452465 A CN101452465 A CN 101452465A
Authority
CN
China
Prior art keywords
file
data
small documents
read
big
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007101990028A
Other languages
Chinese (zh)
Inventor
易国真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Autonavi Software Co Ltd
Original Assignee
Autonavi Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Autonavi Software Co Ltd filed Critical Autonavi Software Co Ltd
Priority to CNA2007101990028A priority Critical patent/CN101452465A/en
Publication of CN101452465A publication Critical patent/CN101452465A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a large-batch file data access method, which comprises: merging data of all the small files into a large file; establishing one-to-one corresponding relation between a file name and a file number of each small file; and establishing a corresponding relation between each file number and file information of the small file, wherein the file information comprises the position of the small file in the large file. Correspondingly, the invention also discloses a large-batch file data read method, which is used for reading file data accessed according to the access method. The method comprises the following steps: acquiring the file number of the small file according to the file name of the small file; acquiring the file information of the small file according to the file number; acquiring the position of the small file in the large file according to the file information; and realizing read of data of the small file through an IO interface of the large file according to the position of the small file in the large file. The technical proposal of the invention can save system resources.

Description

Mass file data storing and read method
Technical field
The present invention relates to software technology field, refer to a kind of mass file data storing and read method especially.
Background technology
Along with popularizing and application of computing machine, need the file data of access also more and more.Described file data is made a general reference all can be kept at data file on the hard disc of computer.Described access refer to preserve data in the calculator memory to the file with read data on the file in calculator memory.The access of a spot of file data is little to the system resource influence, and the mode of the access of large batch of data file directly has influence on the utilization of system resource.Described exponential quantity in enormous quantities surpasses more than 10.
As shown in Figure 1, in general, access for mass file data normally is kept at large batch of file data on the hard disc of computer respectively, and then respectively separately to file I/O (InputOutput of certain document creation, input and output) interface, thus realization is to the operation of each file.Described IO interface refers in particular to the IO interface to file data here.
From the above, because large batch of file is scattered in each place on the hard disk respectively, can produce a lot of problems to whole application system.Described application system: refer to a kind of computer software here, its normal operation need depend on the data that some data files provide.
At first, the application system desire is carried out access to certain file, all need create a new file I/O interface at this document, will waste a lot of times like this in the establishment and destruction of IO interface, wastes a lot of system resources;
Secondly, each file is scattered in the safety that also is unfavorable for application system on the hard disk, because these scattered files might be deleted or mistake is revised by mistake, thereby causes the application system operation to make mistakes;
Moreover, each file is scattered in the portability that also is unfavorable for application system on the hard disk because transplant this system to the other machines after, need all transplant over its corresponding each data file, might occur leaking and copy, thereby cause the application system operation to make mistakes.
Summary of the invention
The problem to be solved in the present invention provides a kind of mass file data storing and read method of conserve system resources.
In order to address the above problem, the technical scheme of mass file data storing method of the present invention comprises:
The data of all small documents are merged into a big file;
Set up the one-to-one relationship of the filename and the reference number of a document thereof of each small documents;
Set up the corresponding relation of the fileinfo of each described reference number of a document and small documents, described fileinfo comprises the position of described small documents in described big file.
Wherein, described fileinfo also comprises the file size of described small documents.
Correspondingly, mass file data read method of the present invention is used to read the file data of depositing according to the technical scheme of mass file data storing method of the present invention, comprises step:
Obtain the reference number of a document of described small documents according to the filename of small documents;
Obtain the fileinfo of described small documents according to described reference number of a document;
Obtain the position of described small documents in big file according to described fileinfo;
According to the position of described small documents in big file, realize reading to described small documents data by the IO interface of described big file.
Compared with prior art, the beneficial effect of mass file data storing of the present invention and read method is:
Owing to be with the synthetic big file of large batch of small documents, also set up the one-to-one relationship of filename He its reference number of a document of small documents, set up the reference number of a document of small documents and the corresponding relation of its fileinfo again.Described fileinfo comprises the position of small documents in big file.The filename of knowing the small documents of wanting access like this just can obtain its reference number of a document, obtained just its fileinfo as can be known of reference number of a document, i.e. its position in big file, therefore, by the data that the IO interface of big file just can be read small documents, saved system resource so greatly.
Description of drawings
Fig. 1 is the synoptic diagram of prior art to the mass file data access;
Fig. 2 is the process flow diagram of mass file data storing method of the present invention;
Fig. 3 is the process flow diagram of mass file data read method of the present invention;
Fig. 4 is the filename of small documents and the corresponding synoptic diagram of its reference number of a document;
Fig. 5 is the reference number of a document of small documents and the corresponding synoptic diagram of fileinfo;
Fig. 6 is the synoptic diagram that large batch of small documents is merged into a big file;
Fig. 7 is the synoptic diagram of the access of the big file that is combined;
Embodiment
As shown in Figure 1, mass file data access method of the present invention comprises:
Step 1) is merged into a big file with the data of all small documents;
Step 2) sets up the one-to-one relationship of the filename and the reference number of a document thereof of each small documents;
Step 3) is set up the corresponding relation of the fileinfo of each described reference number of a document and small documents, and described fileinfo comprises the position of described small documents in described big file.
Wherein, described fileinfo also comprises: the file size of described small documents.
From the above, mass file data access method of the present invention is that the data with all small documents are merged into a big file, just be merged into an independently file, when like this this big file being deposited and reading, only need set up an IO interface of depositing and read this big file and just can realize the data in this big file are deposited and read, as shown in Figure 7.As shown in Figure 6, the data of this big file are made up of the data of small documents one by one.Because mass file data access method of the present invention has been set up the one-to-one relationship of the filename and the reference number of a document thereof of each small documents again, as shown in Figure 4, that is to say the corresponding unique reference number of a document of the filename of each small documents.As shown in Figure 5, mass file data access method of the present invention has also been set up the reference number of a document of small documents and the corresponding relation of the fileinfo of this small documents in big file, that is to say that the reference number of a document of having known small documents just can be known the fileinfo of this small documents.
Correspondingly, as shown in Figure 2, mass file data read method of the present invention is used to read the file data of depositing according to the described method of aforementioned claim, comprising:
Step 11) obtains the reference number of a document of described small documents according to the filename of small documents;
Step 12) obtains the fileinfo of described small documents according to described reference number of a document;
Step 13) obtains the position of described small documents in big file according to described fileinfo;
Step 14) realizes reading described small documents data according to the position of described small documents in big file by the IO interface of described big file.
From the above, after depositing large batch of small documents according to mass file data storing method of the present invention, owing to set up the one-to-one relationship of filename He its reference number of a document of small documents, therefore the filename according to the small documents that will read just can obtain its reference number of a document.The reference number of a document of small documents and the corresponding relation of fileinfo have also been set up in addition, therefore, obtain reference number of a document and just can obtain the fileinfo of small documents in big file, as the reference position of small documents in big file, the size of file just can realize reading the small documents data by the IO interface of big file like this.
Adopt method of the present invention that large batch of small documents is deposited and read below.
For example: have some small documents, be respectively " 1.bmp ", " 2.bmp ", " 3.bmp ", after having defined 5 global variables, to each function call " PutshFile (and " c :/1.bmp ", " 1.bmp ", hOutPackage; fileIndexList; fileIdMapList, ItemId, Offset); " order, just can merge to these files in the big file " Data.dat ".
At first these small documents are merged into a big file (should big file designation for " Data.dat "), set up " file name and file ID (numbering) corresponding tables " and " file ID number with fileinfo corresponding tables " two tables of small documents then, these two tables save as " Data.map " file and " Data.index " file respectively.
Be the false code that realizes below:
A) definition data structure:
The data structure of fileinfo item is as follows:
struct?A3?dFileInfoItem
{
DWORD m_id; // file ID number
INT64m_offset; The reference position of // file in big file
Int m_length; The length of // file
};
The data structure of the mapping item of file ID and file name is as follows:
struct?A3dFileIdFileNameMapItem
{
DWORD m_id; // file ID number
Char m_nameLen; The length of // file name
Char*m_name; The title of // file
};
B) 5 important global variables of definition:
Std::list<A3dFileInfoItem*〉fileIndexList; // fileinfo array
Std::list<A3dFileIdFileNameMapItem*〉fileIdMapList; // file ID and title array
DWORD_ItemId=0; // file ID number
INT64_Offset=0; The reference position of // file
HANDLE hOutPackage; The handle of the big file of // output
hOutPackage=::CreateFile(DataFileName,
GENERIC_READ|GENERIC_WRITE,
FILE_SHARE_READ|FILE_SHARE_WRITE,NULL,
CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL/*|
FILE_FLAG_OVERLAPPED|FILE_FLAG_NO_BUFFERING*/,NULL);//
Create the IO handle of big file
C) write a function PutshFile: this function realization merges to the data of a file in the big file goes.Usage is as follows:
Input parameter filename: the full qualified path that merge to the small documents in the big file;
Input parameter fileStrId: the unique name sign that merge to the small documents in the big file;
Input parameter std::list<A3dFileIndexItem*〉﹠amp; IndexList: fileinfo array
Input parameter std::list<A3dFileIdMapItem*〉﹠amp; IdMapList: file ID and title array
Input parameter DWORD﹠amp; ItemId: file ID number
Input parameter INT64 ﹠amp; Offset: the reference position of file
The usage example of this function: PutshFile (" c :/1.bmp ", " 1.bmp ", hOutPackage, fileIndexList, fileIdMapList, ItemId, Offset); Wherein, " c :/1.bmp " refers to the path of small documents, and " 1.bmp " refers to the file unique name sign of small documents, and 5 parameters of other back are 5 global variables defined above.This function can add " c :/1.bmp " this file the end of big file to, and be " 1.bmp " generate one unique ID number, for this global variable of fileIndexList increases a new record, for this global variable of IdMapList increases a new record.
This function code is as follows:
Bool?PutshFile(CString?filename,CString?fileStrId,
HANDLE?hPackage,
std::list<A3dFileIndexItem*>&?IndexList,
std::list<A3dFileIdMapItem*>&?IdMapList,
DWORD&?ItemId,INT64?&?Offset)
{
// create the file handle of small documents, this handle can be used for reading the file data of small documents
HANDLE?hFile=::CreateFile(filename,
GENERIC_READ|GENERIC_WRITE,
FILE_SHARE_READ|FILE_SHARE_WRITE,NULL,
OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL);
If // create failure, the file failure is read in prompting, and withdraws from
if(hFile=INVALID_HANDLE_VALUE)
{
AfxMessageBox(CString("Read?File?Fail:")+filename);
return?false;
}
// utilize handle, obtain the length information of file
DWORD?high?32;
DWORD?low?32=::GetFileSize(hFile,&high32);
// if to obtain length be 0, illustrates that this small documents does not have data, prompting does not have to withdraw from after the data
if(high32=0&&low32=0)
{
AfxMessageBox(CString("0File?size:")+filename);
return?false;
}
The data that // definition character array " static std::vector<char〉buffer " is used to preserve small documents
DWORD?numberOfBytes;
OVERLAPPED?ol;
static?std::vector<char>buffer;
buffer.resize(low32);
// read small documents data in character array " static std::vector<char〉buffer "
//read?file
memset(&ol,0,sizeof(ol));
if(FALSE=::ReadFile(hFile,(LPVOID)&buffer[0],(DWORD)low32,
&numberOfBytes,&ol))
{
AfxMessageBox("ReadFileErrror");
return?false;
}
// data of small documents are added to the end of big file
//write?file
memset(&ol,0,sizeof(ol));
ol.Offset=Offset&0xffffffff;
ol.OffsetHigh=Offset>>32;
if(FALSE
=::WriteFile(hPackage,(LPVOID)&buffer[0],low32,&numberOfBytes,&ol))
{
AfxMessageBox("WriteFileErrror");
return?false;
}
// be that this global variable of fileIndexList increases new " file ID number and fileinfo " record
//fileIndex
{
A3dFileIndexItem*item=new A3dFileIndexItem (); // newly-built record
Item-〉m_id=ItemId; // this small documents ID number
Item-〉m_offset=Offset; The reference position of // this small documents in big file
Item-〉m_length=low32; The length of // this small documents
IndexList.push_back (item); // add this information in the fileIndexList array
}
// be that this global variable of IdMapList increases new " file name and file ID number " record
//fileIdMap
{
CString_filename=fileStrId;
_filename.Replace(′/′,′\\′);
A3dFileIdMapItem*item=new A3dFileIdMapItem (); // newly-built record
Item-〉m_id=ItemId; // this small documents ID number
Item-〉m_nameLen=_filename.GetLength (); The filename length of // this small documents
Item-〉m_name=new char[item-〉m_nameLen+1]; The filename of // this small documents
memset(item->m_name,0,item->m_nameLen+1);
strcpy(item->m_name,_filename);
IdMapList.push_back (item); // add this information in the IdMapList array
}
ItemId++; // add a small documents after, file ID is number corresponding to add 1
Offset+=low32; Behind small documents of // interpolation, the variable Offset of log file reference position also wants corresponding increase
:: CloseHandle (hFile); // close the handle of small documents, return the interpolation successful information
return?true;
}
D) as c) as described in the step, repeat to call PutshFile, each small documents is synthesized in the big file.
E) write a class, A3dFileIdMap, its realize to preserve " file name and file ID corresponding tables " in file " Data.map ", and the data that realize reading file " Data.map " are in class, and the support user provides file ID that a filename just can inquire this document number.
class?A3dFileIdmap
{
Public:
The function of // preservation " file name and file ID corresponding tables "
Bool?saveFileIdMap(CString?filename,
std::list<A3dFileIdMapItem*>&?IdMapList);
// read data files, data load in the m_filenameIdMap data of class
Bool?Load(LPCSTR?filename);
// according to file name, inquiry file ID number
Bool?getFileInfo(std::string?name,DWORD&?id);
Private:
Std::map<std::string, A3dFileIdFileNameMapItem*〉m_filenameIdMap; // deposit
Put the RBTree of filename and file name map record
};
// file header data structure
struct?A3dFileIdMapHeader
{
Char m_LenFileTag; The length of // file identification
Char* m_filetag; // file identification
Intm_version; // FileVersion
Intm_iNum; The bar number of the record of // file name and file ID number
};
The function of // preservation " file name and file ID corresponding tables "
Bool?A3dFileIdmap::saveFileIdMap(CString?filename,
std::list<A3dFileIdMapItem*>&?IdMapList)
{
A3dFileIdMapHeader?fileIdMapHeader;
{
FileIdMapHeader.m_LenFileTag=16; The length of // file identification
fileIdMapHeader.m_filetag=new
char[fileIdMapHeader.m_LenFileTag+1];
memset(fileIdMapHeader.m_filetag,0,fileIdMapHeader.m_LenFileTag+1);
Strcpy (fileIdMapHeader.m_filetag, " IdMapTableHeader "); // file identification
FileIdMapHeader.m_version=1001; // FileVersion
FileIdMapHeader.m_iNum=IdMapList.size (); The bar number of // record
}
// file header is write in the file,
// each the bar record in the IdMapList array is write in the file
FILE*fp=fopen (filename, " wb "); // open file, carry out a write operation
Fwrite (﹠amp; Header-〉m_LenFileTag, sizeof (header-〉m_LenFileTag), 1, fp); // write
The length of file identification
Fwrite (header-〉m_filetag, sizeof (char), header-〉m_LenFileTag, fp); // write file
Sign
Fwrite (﹠amp; Header-〉m_version, sizeof (header-〉m_version), 1, fp); // write version number
Fwrite (﹠amp; Header-〉m_iNum, sizeof (header-〉m_iNum), 1, fp); // write and shine upon the bar number
// write the mapping of file ID number and file name one by one
for(inti=0;i<m_pHeader->m_iNum;i++)
{
A3dFileIdMapItem*item=IdMapList[i];
Fwrite (﹠amp; Item-〉m_id, sizeof (DWORD), 1, fp); // write file ID number
Fwrite (﹠amp; Item-〉m_nameLen, sizeof (char), 1, fp); // write file name length
Fwrite (item-〉m_name, sizeof (char), item-〉m_nameLen, fp); // write filename
Claim
}
Fclose (fp); // close file
return?true;
}
// read data files, data load in the m_filenameIdMap data of class
Bool?A3dFileIdmap::Load(LPCSTR?filename)
{
FILE*fp=fopen (filename, " rb "); // open file, carry out read-only operation
// read file header
A3dFileIdMapHeader*pHeader=new?A3dFileIdMapHeader();
if(fread(&pHeader->m_LenFileTag,sizeof(pHeader->m_LenFileTag),1,fp)=
NULL) // read the length of file identification
Goto fail; // failure is handled
pHeader->m_filetag=new?char[pHeader->m_LenFileTag+1];
memset(pHeader->m_filetag,0,pHeader->m_LenFileTag+1);
if(fread(pHeader->m_filetag,sizeof(char),pHeader->m_LenFileTag,fp)=NUL
L) // read file identification
Gotofail; // failure is handled
// read fileversion number
if(fread(&pHeader->m_version,sizeof(pHeader->m_version),1,fp)=NULL)
Goto fail; // failure is handled
// reading and recording bar number
if(fread(&pHeader->m_iNum,sizeof(pHeader->m_iNum),1,fp)=NULL)
Goto fail; // failure is handled
// read each bar record, and join in the RBTree
for(inti=0;i<m_pHeader->m_iNum;i++)
{
A3dFileIdMapItem*item=new?A3dFileIdMapItem();
// read file ID number
if(fread(&item->m_id,sizeof(DWORD),1,fp)=NULL)
Goto fail; // failure is handled
// read the length of filename
if(fread(&item->m_nameLen,sizeof(char),1,fp)=NULL)
Goto fail; // failure is handled
item->m_name=new?char[item->m_nameLen+1];
memset(item->m_name,0,item->m_nameLen+1);
// read file name
if(fread(item->m_name,sizeof(char),item->m_nameLen,fp)=NULL)
Goto fail; // failure is handled
// add recording in the RBTree
m_filenameIdMap.insert(std::pair<std::string,A3dFileIdMapItem*>(std
::string(item->m_name),item));
}
Fclose (fp); // close file
return?true;
fail:return?false;
}
// according to file name, inquiry file ID number
Bool?A3dFileIdmap::getFileInfo(std::string?name,DWORD&?id)
{
std::map<std::string,A3dFileIdMapItem*>::iterator?it;
It=m_filenameIdMap.find (name); // RBTree fast query
If (it=m_filenameIdMap.end ()) // inquiry failure is handled
return?false;
A3dFileIdMapItem*item=it->second;
Id=item-〉m_id; // return ID number that inquires
return?true;
}
F) write a class, A3dFileIndex, it realizes preserving " file ID and fileinfo corresponding tables " in file " Data.index ", and the data that realize reading file " Data.index " are in class, and support the user to provide a file ID number just can inquire the fileinfo of this document.
class?A3dFileIndex
{
Public:
The function of // preservation " file ID and fileinfo corresponding tables "
Bool?saveFileIndex(CString?filename,
std::list<A3dFileIndexItem*>&?IndexList);
// read data files, data load in the m_filenameIndexMap data of class
Bool?Load(LPCSTR?filename);
// according to file ID number, inquiry file information
Bool?getFileInfo(DWORD?id,INT64&?offset,int&?length);
Private:
Std::map<DWORD, A3dFileIndexItem*〉m_filenameIndexMap; // deposit file
The ID number RBTree with the fileinfo map record
};
// file header data structure
struct?A3dFileIndexHeader
{
Char m_LenFileTag; The length of // file identification
Char*m_filetag; // file identification
Intm_version; // FileVersion
Intm_iNum; The bar number of the record of // correspondence
};
// preserve the function of " file ID number with fileinfo corresponding tables "
Bool?A3dFileIndex::saveFileIndex(CString
filename,std::list<A3dFileIndexItem*>&?IndexList)
{
A3dFileIndexHeader?fileIndexHeader;
{
FileIndexHeader.m_LenFileTag=16; The length of // file identification
fileIndexHeader.m_filetag=new?char[fileIndexHeader.m_LenFileTag+1];
memset(fileIndexHeader.m_filetag,0,fileIndexHeader.m_LenFileTag+1);
Strcpy (fileIndexHeader.m_filetag, " IndexTableHeader "); // file identification
FileIndexHeader.m_version=1001; // FileVersion
FileIndexHeader.m_iNum=IndexList.size (); // file ID is number corresponding with fileinfo
The bar number of record
}
// file header is write in the file
// the IndexList array is write in the file
The correlation technique of // method and A3dFileIdmap::saveFileIdMap is just the same, repeats no more here
returntrue;
}
// read data files, data load in the m_filenameIdMap data of class
Bool?A3dFileIdmap::Load(LPCSTR?filename)
{
FILE*fp=fopen (filename, " rb "); // open file, carry out read-only operation
// read file header
A3dFileIndexHeader*pHeader=new?A3dFileIndexHeader();
if(fread(&pHeader->m_LenFileTag,sizeof(pHeader->m_LenFileTag),1,fp)=
NULL) // read the length of file identification
Gotofail; // failure is handled
pHeader->m_filetag=new?char[pHeader->m_LenFileTag+1];
memset(pHeader->m_filetag,0,pHeader->m_LenFileTag+1);
if(fread(pHeader->m_filetag,sizeof(char),pHeader->m_LenFileTag,fp)=NUL
L) // read file identification
Goto fail; // failure is handled
// read fileversion number
if(fread(&pHeader->m_version,sizeof(pHeader->m_version),1,fp)=NULL)
Goto fail; // failure is handled
// reading and recording bar number
if(fread(&pHeader->m_iNum,sizeof(pHeader->m_iNum),1,fp)=NULL)
Goto fail; // failure is handled
// read each bar record, and join in the RBTree
for(int?i=0;i<m_pHeader->m_iNum;i++)
{
A3dFileIndexItem*item=new?A3dFileIndexItem();
// read file ID number
if(fread(&item->m_id,sizeof(DWORD),1,fp)=NULL)
Goto fail; // failure is handled
// read file ID number
// read the document misregistration position, i.e. start position information in big file
// read the size information of file
// add recording m_ in the RBTree
filenameIndexMap.insert(std::pair<DWORDA3dFileIndexMapItem*>
(file ID number, item));
}
Fclose (fp); // close file
return?true;
fail:returnfalse;
}
// according to file ID number, inquiry file information
Bool?A3dFileIndexmap::getFileInfo(DWORD?id,INT64&?offset,int&?length)
{
std::map<DWORDA3dFileIndexMapItem*>::iterator?it;
It=m_filenameIndexMap.find (id); // RBTree fast query
If (it=m_filenameIndexMap.end ()) // inquiry failure is handled
return?false;
A3dFileIndexMapItem*item=it->second;
Offset=item-〉m_offset; // return the deviation post that inquires
length=item->m_length;
return?true;
}
To the access visit of big file, this example adopts the port mechanism of finishing.The described port mechanism of finishing refers to a kind of IO model, in this model, can realize calling of asynchronous system.Described asynchronous finger is under the situation of worker A and worker B collaborative work, worker A is after worker B has submitted work request to, worker A does not wait for that worker B reacts and just leaves immediately, and worker B has way to notify this work request of worker A to finish after a work request that executes worker A submission.
Set up one in this example and finish port queue, realize asynchronous access operation big file.Adopt a class A3dSingleThreadIocpQueueImplement to realize finishing the establishment of port, and realize utilizing and finish port, to big file asynchronous carry out access visit.
Such interface that is exposed to the user mainly is WriteFile function and ReadFile function, and these two functions can be realized the data of the big file of asynchronous access.
The main code of class A3dSingleThreadIocpQueueImplement is as follows:
class?A3dSingleThreadIocpQueueImplement
{
private:
HANDLEm_hIocp; // finish the handle of port
HANDLE m_hWaitEvent; // incident
Enum IocpOperation{DestroyIocp}; // finish the port operation type
protected:
static?DWORD?WINAPI_WorkerThreadProc(LPVOID?lpParam);//
The worker thread processing logic
~A3dSingleThreadIocpQueueImplement(){}
// submit incident to finishing port
virtual?void?Post(unsigned?long?numberOfBytes,OVERLAPPED*
overlapped);
// finish port events to handle
virtual?void?QueuedEvent(DWORD?numberOfBytes,
LPOVERLAPPED?overlapped,BOOL?success);
// be responsible for specifically implementing to read data in the file in internal memory
void?Read(ReadWriteContext*?context);
// be responsible for concrete enforcement to write file data in big file
void?Write(ReadWriteContext*?context);
Struct ReadWriteContext//definition data structure ReadWriteContext
{
OVERLAPPED ol; // finish the overlapping IO of port
Bool bRead; // whether be the sign of read operation
Std::vector<char〉* buffer; // character array is used to deposit file data
DWORD fileId; // file ID number
In tlength; // file size
Int icurLength; // completed file size
INT64 offset; // deviation post
A3dIreceiver*receiver; // the notify object that is used to adjust back
CRITICAL_SECTION?contextLock;
};
public:
A3dSingleThreadIocpQueueImplement();
// internal memory of parameter input is written in the big file
virtual?bool?WriteFile(DWORD?fileId,std::vector<char>*buffer);
// from big file, read the data of certain file in internal memory
virtual?void?ReadFile(DWORD?fileId,int?fileLength,A3dIReceiver*
receiver);
};
The Key Implementation function code of class is as follows:
File data is written to function " WriteFile " in the big file, and parameter " fileId " is a file ID number,
Parameter " std::vector<char〉* buffer " be the file data internal memory
bool?A3dSingleThreadIocpQueueImplement::WriteFile(DWORD?fileId,
std::vector<char>*buffer)
{
ReadWriteContext*context=new ReadWriteContext (); // newly-builtly once read
The record of write operation
::EnterCriticalSection(&context->contextLock);
Context-〉bRead=false; // be designated write state
context->buffer=new?std::vector<char>(buffer->size());
* context-〉buffer=*buffer; // xcopy data are in structure
Context-〉fileId=fileId; // xcopy ID number in structure
Context-〉length=context-〉buffer-〉size (); // xcopy size is in structure
Context-〉icurLength=0; // current completed size is 0
Context-〉offset=0; // deviation post is temporarily unknown, is made as 0
::LeaveCriticalSection(&context->contextLock);
Write (context); // this function specific implementation carry out write-in functions to big file
return?true;
}
The function " ReadFile " of the data that from big file, read certain file in the internal memory, parameter " fileId " is a file ID number, parameter " fileLength " is a file size information, parameter " receiver " is the notify object that is used to adjust back, finish port and in a single day finish read operation, will notify " receiver " to read and finish.
void?A3dSingleThreadIocpQueueImplement::ReadFile(DWORD?fileId,int
fileLength.,A3dIReceiver*receiver)
{
INT64offset;
// search " file ID and fileinfo corresponding tables ", obtain the big of file according to the ID of file
Little and reference position
if(m_pFileIndex->getFileInfo(fileId,offset,fileLength))
{
ReadWriteContext*context=new ReadWriteContext (); // once newly-built
The record of read-write operation
::EnterCriticalSection(&context->contextLock);
Context-〉bRead=true; // be designated reading state
context->buffer=new
std::vector<char>((fileLength+m_dwBlockSize-1)&~(m_dwBlockSize
-1)); // open up one section internal memory that is used to preserve file data
Context-〉fileId=fileId; // xcopy ID number in structure
Context-〉length=fileLength; // xcopy size is in structure
Context-〉icurLength=0; // current completed size is 0
Context-〉offset=offset; // duplicate deviation post in structure
Contex-〉receiver=receiver; // duplicate the readjustment notify object in structure
::LeaveCriticalSection(&context->contextLock);
Read (context); // this function specific implementation carry out read functions to big file,
See the explanation of Read function for details
}
}
// be responsible for concrete enforcement to write file data in big file, it should be noted, this function is privately owned function, can only be by the WriteFile function call, owing to be overlapping port, this function is called at every turn, all writes one piece of data (being generally system's paging size) in the data of small documents to big file.The user has just left after calling the WriteFile function, finish port and called this function, realized after big file writes one piece of data, can judge whether small documents all is written into has finished, if do not finish, finish port and can call this function once more, write up to small documents and finished, at this moment, the notice user file writes and finishes.
void?A3dSingleThreadIocpQueueImplement::Write(ReadWriteContext*context)
{
::EnterCriticalSection(&context->contextLock);
DWORD?highSize;
DWORD?lowSize;
LowSize=GetFileSize (m_hCacheFile , ﹠amp; HighSize); // obtain the current length of big file, high 32 bit lengths and low 32 bit lengths
memset(&context->ol,0,sizeof(context->ol));
Context-〉ol.Offset=lowSize; Low 32 biased the moving of the overlapping port of // renewal structure
Information
Context-〉ol.OffsetHigh=highSize; // upgrade overlapping port high by 32 of structure
The biased information of moving
If (context-〉icurLength=0) // the upgrade deviation post information of structure
context->offset=((INT64)highSize)<<32|(INT64)lowSize;
DWORD?numberOfBytes;
Int nLeft=context-〉length-context-〉icurLength; // uncompleted size equals file size and deducts completed size again
DWORD
NToWrite=nLeft〉m_dwBlockSize? m_dwBlockSize:nLeft; // determine this
The size of write operation, it is " uncompleted size " and " size of system's paging " little person between the two
if(FALSE
=::WriteFile(m_hCacheFile,(LPVOID)&(*context->buffer)[context-
>icurLength],nToWrite,&numberOfBytes,(LPOVERLAPPED)contex
T)) // carry out write operation, and carry out error handling processing
{
switch(DWORD?d=::GetLastError())
{
case?ERROR_HANDLE_EOF:
break;
case?ERROR_IO_PENDING:
break;
default:
throwd;
}
}
::LeaveCriticalSection(&context->contextLock);
}
// be responsible for the concrete data of implementing to read in the big file in internal memory, it should be noted, this function is privately owned function, can only be by the ReadFile function call, owing to be overlapping port, this function is called at every turn, all reads one piece of data (being generally system's paging size) from big file, and this segment data is left in the internal memory.The user has just left after calling the ReadFile function, finish port and called this function, after having realized reading one piece of data and leaving internal memory in from big file, can judge whether small documents all is read has finished, if do not finish, finish port and can call this function once more, all read up to the small documents data and finished, at this moment, the notice user file reads and finishes.
void?A3dSingleThreadIocpQueueImplement::Read(ReadWriteContext*context)
{
::EnterCriticalSection(&context->contextLock);
{
memset(&context->ol,0,sizeof(context->ol));
Context-〉ol.Offset=context-〉offset﹠amp; 0xffffffff; The overlapping port of // renewal
Read positional information, low 32
Context-〉ol.OffsetHigh=context-〉offset〉32; The overlapping port of // renewal
Read positional information, high 32
DWORD?numberOfBytes;
DWORD?nToRead;
IntnLeft=context-〉length-context-〉icurLength; // uncompleted size
Equal file size and deduct completed size again
If (context-〉icurLength%m_dwBlockSize=0) // judge uncompleted big
Whether little less than the size of system's paging
{
NToRead=nLeft〉m_dwBlockSize? m_dwBlockSize:nLeft; This reads behaviour // decision
The size of doing, it is " uncompleted size " and " size of system's paging " little person between the two
}
else
{
DWORD
nLastReadLeft=(context->icurLength+m_dwBlockSize-1)&~(m_dwB
lockSize-1)-context->icurLength;
NToRead=nLeft〉nLastReadLeft? nLastReadLeft:nLeft; This read operation of // decision
Size, it is " a uncompleted size " and " the uncompleted size of read operation last time " little person between the two
}
if(FALSE=::ReadFile(m_hCacheFile,
(LPVOID)&(*context->buffer)[context->icurLength],
(DWORD)nToRead,&numberOfBytes,(LPOVERLAPPED)context))
{) // carry out read operation, and carry out error handling processing
switch(DWORD?d=::GetLastError())
{
case?ERROR_HANDLE_EOF:
break;
case?ERROR_IO_PENDING:
break;
default:
throwd;
}
}
}
::LeaveCriticalSection(&context->contextLock);
}
For example small documents be to read, " file name and file ID mapping table " file and " file ID and fileinfo mapping table " file at first will be read.Global variable " A3dFileIdmap*g_pFileIDmap " is used for preserving the pointer of " file name and file ID mapping table " class, and global variable " A3dFileIndexmap*g_pFileIndexmap " is used for preserving the pointer of " file ID number and file information table " class.Call the Load function of these two classes respectively, just can from file, these two information tables be loaded in the internal memory.
For example, file " 1.bmp " need be read, then " DWORDid can be called earlier; G_pFileIDmap-〉and getFileInfo (std::string (" 1.bmp ", id) " obtain the file ID number of " 1.bmp "; And then call " INT64 offset; Int length; G_pFileIndexmap-〉and getFileInfo (id, offset, length) " obtain the length l ength of this document; Then, set " being used to finish receiving the readjustment informant of a notice " object " receiver "; Call at last the function A3dSingleThreadIocpQueueImplement::ReadFile that finishes port (id, length, receiver).Like this, application program just can be left immediately, waits to finish port and file is read finish, and its can be being notified to the receiver object, and the content of file is issued the receiver object.
If preserve file " 1.bmp ", then can call " DWORDid earlier; G_pFileIDmap-〉and getFileInfo (std::string (" 1.bmp ", id) " obtain the file ID number of " 1.bmp "; And then the internal memory of file is saved in the variable " std::vector<char〉* buffer ", call the function A3dSingleThreadIocpQueueImplement::WriteFile that finishes port (id, buffer).Like this, application program just can be left immediately, finishes port and can finish write operation.
In sum,, also set up the one-to-one relationship of filename He its reference number of a document of small documents, set up the reference number of a document of small documents and the corresponding relation of its fileinfo again owing to be with the synthetic big file of large batch of small documents.Described fileinfo comprises the position of small documents in big file.The filename of knowing the small documents of wanting access like this just can obtain its reference number of a document, obtained just its fileinfo as can be known of reference number of a document, i.e. its position in big file, therefore, by the data that the IO interface of big file just can be read small documents, saved system resource so greatly.

Claims (3)

1, a kind of mass file data storing method is characterized in that, comprises step:
The data of all small documents are merged into a big file;
Set up the one-to-one relationship of the filename and the reference number of a document thereof of each small documents;
Set up the corresponding relation of the fileinfo of each described reference number of a document and small documents, described fileinfo comprises the position of described small documents in described big file.
2, mass file data storing method as claimed in claim 1 is characterized in that, described fileinfo also comprises the file size of described small documents.
3, a kind of mass file data read method is used to read the file data of depositing according to the described method of aforementioned claim, it is characterized in that, comprises step:
Obtain the reference number of a document of described small documents according to the filename of small documents;
Obtain the fileinfo of described small documents according to described reference number of a document;
Obtain the position of described small documents in big file according to described fileinfo;
According to the position of described small documents in big file, realize reading to described small documents data by the IO interface of described big file.
CNA2007101990028A 2007-12-05 2007-12-05 Mass file data storing and reading method Pending CN101452465A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007101990028A CN101452465A (en) 2007-12-05 2007-12-05 Mass file data storing and reading method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007101990028A CN101452465A (en) 2007-12-05 2007-12-05 Mass file data storing and reading method

Publications (1)

Publication Number Publication Date
CN101452465A true CN101452465A (en) 2009-06-10

Family

ID=40734701

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007101990028A Pending CN101452465A (en) 2007-12-05 2007-12-05 Mass file data storing and reading method

Country Status (1)

Country Link
CN (1) CN101452465A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101854388A (en) * 2010-05-17 2010-10-06 浪潮(北京)电子信息产业有限公司 Method and system concurrently accessing a large amount of small documents in cluster storage
CN102110106A (en) * 2009-12-23 2011-06-29 新奥特(北京)视频技术有限公司 Image-text packing server and method for maintaining index files in server
CN102110105A (en) * 2009-12-23 2011-06-29 新奥特(北京)视频技术有限公司 Method and device of picture-text packaging system for reading folder
CN103064843A (en) * 2011-10-20 2013-04-24 北京中搜网络技术股份有限公司 Data processing device and data processing method
CN103077166A (en) * 2011-10-25 2013-05-01 深圳市快播科技有限公司 Spatial multiplexing method and device for small file storage
CN103324552A (en) * 2013-06-06 2013-09-25 西安交通大学 Two-stage single-instance data de-duplication backup method
CN103559229A (en) * 2013-10-22 2014-02-05 西安电子科技大学 Small file management service (SFMS) system based on MapFile and use method thereof
CN103580989A (en) * 2012-07-31 2014-02-12 腾讯科技(深圳)有限公司 Junk mail processing method and system
CN103678293A (en) * 2012-08-29 2014-03-26 百度在线网络技术(北京)有限公司 Data storage method and device
CN103797480A (en) * 2011-09-14 2014-05-14 富士通株式会社 Extraction method, extraction program, extraction device, and extraction system
CN104123237A (en) * 2014-06-24 2014-10-29 中电科华云信息技术有限公司 Hierarchical storage method and system for massive small files
CN104142937A (en) * 2013-05-07 2014-11-12 深圳中兴网信科技有限公司 Method, device and system for distributed data access
CN104424049A (en) * 2013-09-02 2015-03-18 联想(北京)有限公司 Data processing method and electronic device
CN104572670A (en) * 2013-10-15 2015-04-29 方正国际软件(北京)有限公司 Small file storage, query and deletion method and system
CN104778270A (en) * 2015-04-24 2015-07-15 成都汇智远景科技有限公司 Storage method for multiple files
CN104978351A (en) * 2014-04-09 2015-10-14 中国电信股份有限公司 Backup method of mass small files and cloud store gateway
CN109039804A (en) * 2018-07-12 2018-12-18 武汉斗鱼网络科技有限公司 A kind of file reading and electronic equipment
CN109101598A (en) * 2018-07-31 2018-12-28 成都华栖云科技有限公司 A kind of small page picture rendering method

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110106A (en) * 2009-12-23 2011-06-29 新奥特(北京)视频技术有限公司 Image-text packing server and method for maintaining index files in server
CN102110105A (en) * 2009-12-23 2011-06-29 新奥特(北京)视频技术有限公司 Method and device of picture-text packaging system for reading folder
CN102110106B (en) * 2009-12-23 2015-04-29 新奥特(北京)视频技术有限公司 Image-text packing server and method for maintaining index files in server
CN102110105B (en) * 2009-12-23 2015-04-29 新奥特(北京)视频技术有限公司 Method and device of picture-text packaging system for reading folder
CN101854388A (en) * 2010-05-17 2010-10-06 浪潮(北京)电子信息产业有限公司 Method and system concurrently accessing a large amount of small documents in cluster storage
CN103797480A (en) * 2011-09-14 2014-05-14 富士通株式会社 Extraction method, extraction program, extraction device, and extraction system
CN103064843A (en) * 2011-10-20 2013-04-24 北京中搜网络技术股份有限公司 Data processing device and data processing method
CN103064843B (en) * 2011-10-20 2016-03-16 北京中搜网络技术股份有限公司 Data processing equipment and data processing method
CN103077166A (en) * 2011-10-25 2013-05-01 深圳市快播科技有限公司 Spatial multiplexing method and device for small file storage
CN103077166B (en) * 2011-10-25 2016-08-03 深圳市天趣网络科技有限公司 The method for spacial multiplex of small documents storage and device
CN103580989A (en) * 2012-07-31 2014-02-12 腾讯科技(深圳)有限公司 Junk mail processing method and system
CN103580989B (en) * 2012-07-31 2018-07-24 腾讯科技(深圳)有限公司 Junk mail processing method and system
CN103678293A (en) * 2012-08-29 2014-03-26 百度在线网络技术(北京)有限公司 Data storage method and device
CN104142937A (en) * 2013-05-07 2014-11-12 深圳中兴网信科技有限公司 Method, device and system for distributed data access
CN103324552A (en) * 2013-06-06 2013-09-25 西安交通大学 Two-stage single-instance data de-duplication backup method
CN103324552B (en) * 2013-06-06 2016-01-13 西安交通大学 Two benches list example duplicate removal data back up method
CN104424049A (en) * 2013-09-02 2015-03-18 联想(北京)有限公司 Data processing method and electronic device
CN104424049B (en) * 2013-09-02 2018-06-01 联想(北京)有限公司 A kind of data processing method and electronic equipment
CN104572670A (en) * 2013-10-15 2015-04-29 方正国际软件(北京)有限公司 Small file storage, query and deletion method and system
CN104572670B (en) * 2013-10-15 2019-07-23 方正国际软件(北京)有限公司 A kind of storage of small documents, inquiry and delet method and system
CN103559229A (en) * 2013-10-22 2014-02-05 西安电子科技大学 Small file management service (SFMS) system based on MapFile and use method thereof
CN104978351A (en) * 2014-04-09 2015-10-14 中国电信股份有限公司 Backup method of mass small files and cloud store gateway
CN104123237A (en) * 2014-06-24 2014-10-29 中电科华云信息技术有限公司 Hierarchical storage method and system for massive small files
CN104778270A (en) * 2015-04-24 2015-07-15 成都汇智远景科技有限公司 Storage method for multiple files
CN109039804A (en) * 2018-07-12 2018-12-18 武汉斗鱼网络科技有限公司 A kind of file reading and electronic equipment
CN109039804B (en) * 2018-07-12 2020-08-25 武汉斗鱼网络科技有限公司 File reading method and electronic equipment
CN109101598A (en) * 2018-07-31 2018-12-28 成都华栖云科技有限公司 A kind of small page picture rendering method

Similar Documents

Publication Publication Date Title
CN101452465A (en) Mass file data storing and reading method
US10140461B2 (en) Reducing resource consumption associated with storage and operation of containers
JP5387757B2 (en) Parallel data processing system, parallel data processing method and program
US7676481B2 (en) Serialization of file system item(s) and associated entity(ies)
US8868626B2 (en) System and method for controlling a file system
US8996468B1 (en) Block status mapping system for reducing virtual machine backup storage
CN102436408B (en) Data storage cloud and cloud backup method based on Map/Dedup
CN102629247B (en) Method, device and system for data processing
US20130325915A1 (en) Computer System And Data Management Method
US20090019223A1 (en) Method and systems for providing remote strage via a removable memory device
US8095678B2 (en) Data processing
CN103020315A (en) Method for storing mass of small files on basis of master-slave distributed file system
CN100498781C (en) Method for storing metadata of logic document system by adhesion property
US20110239231A1 (en) Migrating electronic document version contents and version metadata as a collection with a single operation
CN103460197A (en) Computer system, file management method and metadata server
US11836116B2 (en) Managing operations between heterogeneous file systems
US20080005524A1 (en) Data processing
JP2022501747A (en) Data backup methods, equipment, servers and computer programs
US9298733B1 (en) Storing files in a parallel computing system based on user or application specification
Cruz et al. A scalable file based data store for forensic analysis
US7167872B2 (en) Efficient file interface and method for providing access to files using a JTRS SCA core framework
CN103197987A (en) Data backup method, data recovery method and cloud storage system
US7836079B2 (en) Virtual universal naming convention name space over local file system
CN103210389B (en) A kind for the treatment of method and apparatus of metadata
US20140040191A1 (en) Inventorying and copying file system folders and files

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20090610