WO2017211651A1

WO2017211651A1 - Devices and methods for core dump deduplication

Info

Publication number: WO2017211651A1
Application number: PCT/EP2017/063211
Authority: WO
Inventors: Christoph Neumann; Nicolas Le Scouarnec
Original assignee: Thomson Licensing
Priority date: 2016-06-08
Filing date: 2017-05-31
Publication date: 2017-12-14

Abstract

Devices and methods for core dump deduplication. After a core dump in a device (310), the device scans (S43) the memory for memory pages that contain segments to be included in the core dump file. For a memory pages, the device hashes (S44) the memory page to obtain a hash value that is sent (S45) to the server (320) with the location of the memory page in the core dump file. The server verifies (S46) whether it stores a memory page whose hash value matches the received hash value. If not, the server requests (S47) the memory page from the device (310) that returns (S48) the memory page, which is stored; if so, the server stores (S49) a reference to the existing memory page in the core dump file. The server then removes (S51) virtual addresses from the memory pages and, for a memory page, generates (S52) a second hash value, verifies (S53) whether it stores an identical hash value. If not, the server stores (S54) the memory page and the second hash value; if so, the server stores (S55) a link to a memory page corresponding to the stored identical hash value.

Description

DEVICES AND METHODS FOR CORE DUMP DEDUPLICATION

TECHNICAL FIELD

The present disclosure relates generally to computer software and in particular to deduplication of core dumps from devices executing the computer software.

BACKGROUND

This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present disclosure that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art. One way to improve the quality of software is to identify problems users face with software. Crashes, often caused by a so-called segmentation fault (often shortened to "segfault") are one kind of these problems. As crashes are rare in released products, they are difficult to obtain and hence to analyse. For this reason, current practice is to have devices with deployed software send crash reports to a server, as is for example the case for the Ubuntu and Microsoft Windows operating systems. Analysis can then be used to pinpoint the crashes that occur the most often, on what hardware and for what version of software.

In for example Linux, when a segmentation fault occurs, the Linux kernel can generate a core dump, which is a dump of a subset of the memory and of the register state at the time of the crash. This core dump can be sent in the crash report to the server.

Since the Linux kernel dumps registers first and then scans through segment by segment in virtual address order, the core dump depends on the memory layout. Figure 1 shows the memory layout for an exemplary conventional Linux process, where the distinct bands in the address space (which begins at the bottom) correspond to memory segments: the heap, the stack, and so on. A Linux core dump therefore first contains memory segments on the heap and last the segments related to the stack. It is noted that one segment is normally composed of one or more memory pages. The Linux core dumps are generated and stored in the Executable and

Linking Format (ELF), execution view [see Committee, T. (1995). Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification Version 1 .2. TIS Committee, (May)]. Figure 2 shows an exemplary file 20 in the conventional ELF file format execution view. The file 20 starts with an ELF header 22 that describes the organisation of the rest of the file. A program header table 23 locates segment images 24, 25 within the file that finishes with an optional section header table 27.

Annex I comprises an example core dump file as it looks opened with the tool readelf. In the example, the program header part contains a mapping of virtual addresses to file offsets. Each mapping corresponds to the start of a segment in the core dump file.

Core dump files can be quite big and thus require much bandwidth to transmit and much memory to store. A common technique to save on memory usage and storage volume, as well as bandwidth and energy consumption, is deduplication. This is a form of compression and its principle is to save one single copy of a data object and to create only pointers for other copies of the data object. Deduplication is implemented in various environments and applicable to different type of data objects, as will be described hereinafter.

A running operating system can have a large number of memory pages that hold identical data. Memory deduplication optimizes the usage of memory by merging duplicate memory pages. To do so, a process regularly scans through the memory to find pairs of pages with identical data, merges these pages into one single page, and maps the page to both original locations. Linux implements memory deduplication using a kernel module called Kernel Samepage Merging (KSM). With file level deduplication, a single copy of each file is stored. Two or more files are identified as identical if they have the same hash value. Block level deduplication cuts files into blocks and stores only a single copy of each block. File level deduplication and block level deduplication are typically implemented on storage servers, e.g. in the cloud.

However, conventional deduplication is not suitable for core dumps as almost every core dump file is different from another core dump file, even if it is from the same program. A reason for this is that core dump files encompass the stack, with the memory addresses of return functions and parameter values and these change from one run to another, especially when Address Space Layout Randomization (ASLR) is used to enhance the security of the program. In addition, a core dump also contains the heap with the different variable values and the program header files contain memory addresses that are different for each execution and hence at each crash. Applying conventional file level deduplication would thus be largely ineffective.

Block level deduplication could be more effective, as some portions of different core dumps may be the same. Still, it is far from trivial how to cut a core dump into smaller blocks. Finally, because of the memory and storage constraints, it is generally not an option to generate a file and then cut it into blocks that are deduplicated; the processing has to be online.

Blocks could be of fixed size or, to allow deduplication even in the case of unaligned data, be based on variable-length block with a pattern marking the end of a block as described in Muthitacharoen Athicha, Benjie Chen, and David Mazieres. "A low-bandwidth network file system." ACM SIGOPS Operating Systems Review. Vol. 35. No. 5. ACM, 2001 .

The file deduplication and the block deduplication already described is mainly used to save storage resources. In some circumstances it also allows saving bandwidth. For example, when a client wants to upload a file to data storage service, the following protocol can be executed: the client first sends a hash of the file to the server that checks if the corresponding file is already stored (i.e., if the file has been uploaded before by another client) and notifies the client accordingly. Then the client only uploads the file if the server does not already have a copy of the file.

It will thus be appreciated that there is a desire for a solution that addresses at least some of the shortcomings of the conventional solutions. The present principles provide such a solution.

SUMMARY OF DISCLOSURE

In a first aspect, the present principles are directed to a method for deduplication of a core dump file. A hardware processor of a device scans a memory of the device for at least one memory page containing segments to be included in a core dump file, hashes the at least one memory page to obtain a hash value for the at least one memory page, sends the hash value for the at least one memory page and at least one location of the at least one memory page in the core dump file, and upon reception of a request for the at least one memory page, returns the at least one memory page.

In a second aspect, the present principles are directed to a device for deduplication of a core dump file. The device comprises memory for storing the core dump file and a hardware processor coupled to the memory and configured to scan the memory for at least one memory page containing segments to be included in a core dump file to be sent; for each memory page to be included in the core dump file to be sent: hash the memory page to obtain a hash value for the memory page, and send the hash value for the memory page and the location of the memory page in the core dump file; and upon reception of a request for a memory page, return the memory page of the request. In a third aspect, the present principles are directed to a method for deduplication of a core dump file. A hardware processor of an apparatus receives a hash value for a memory page of a core dump file and a location of the memory page in the core dump file; verifies whether a memory page whose hash value matches the received hash value is stored in memory of the apparatus; in case the memory does not store such a memory page, requests the memory page, receives the requested memory page in response, and stores the requested memory page and the hash value; and in case the memory stores such a memory page, stores a reference to the memory page in the memory within the core dump file. In a fourth aspect, the present principles are directed to apparatus for deduplication of a core dump file. The apparatus comprises memory configured to store memory pages of a core dump file and hash values corresponding to the memory pages and a hardware processor configured to: receive hash values for memory pages of the core dump file and locations of the memory pages; for each received hash value, verify whether a memory page whose hash value matches the received hash value is stored in the memory; in case the memory does not store such a memory page, request the memory page, receive the requested memory page, and store the received memory page and the hash value in the memory; and in case the memory stores such a memory page, store a reference to the existing memory page in the core dump file.

In a fifth aspect, the present principles are directed to a computer program comprising program code instructions executable by a processor for implementing the steps of a method according to any embodiment of the first aspect.

In a sixth aspect, the present principles are directed to a computer program product which is stored on a non-transitory computer readable medium and comprises program code instructions executable by a processor for implementing the steps of a method according to any embodiment of the first aspect.

BRIEF DESCRIPTION OF DRAWINGS

Preferred features of the present principles will now be described, by way of non-limiting example, with reference to the accompanying drawings, in which: Figure 1 shows the memory layout for an exemplary conventional Linux process; Figure 2 shows an exemplary file in the conventional ELF file format execution view;

Figure 3 illustrates an exemplary system 300 implementing the present principles; Figure 4 is a flowchart illustrating a first part of a method for core dump deduplication according to an embodiment of the present principles; and

Figure 5 is a flowchart illustrating a second part of a method for core dump deduplication according to an embodiment of the present principles. DESCRIPTION OF EMBODIMENTS

Figure 3 illustrates an exemplary system 300 implementing the present principles. The system 300 comprises a device 310 and a server 320 operably coupled through a network 330 such as for example the Internet.

The device 310 comprises at least one hardware processing unit ("processor") 31 1 configured to execute instructions of a first software program and to generate and send a core dump in case the first software program crashes, as will be further described hereinafter. The device 310 further comprises at least one memory 312 (for example ROM, RAM and Flash) configured to store the software program, data generated by the execution of the software program and core dump data. The device 310 also comprises at least one communications interface ("I/O") 313 configured to communicate with other devices, in particular the server 320.

The server 320 comprises at least one hardware processing unit ("processor") 321 configured to perform core dump deduplication with core dumps received from devices such as device 320. The server 320 further comprises memory 322 configured to store core dump data and at least one communications interface ("I/O") 323 configured to communicate with other devices, in particular the device 310. The server is preferably implemented as a single device, but its functionality can also be distributed over a plurality of devices (e.g., in the cloud).

Non-transitory storage media 315, 325 store instructions that, when executed by a processor, respectively perform the functions of the device 310 and the server 320 as further described hereinafter. The skilled person will appreciate that the illustrated device and server are very simplified for reasons of clarity and that real devices in addition would comprise features such as internal connections and power supplies.

Figure 4 is a flowchart illustrating a first part of a method for core dump file deduplication according to an embodiment of the present principles. At step S41 , the (processor 31 1 of the) device 310 extracts the ELF header and the Program header table from a core dump file in the conventional ELF file format execution view (but it will be understood that other suitable file formats may be used). At step S42, the device 310 sends the extracted ELF header and Program header table to the server 320 that stores these. At step S43, the device 310 scans its memory for memory pages that contain the segments to be included in the core dump file to send to the server 320.

The method may then perform the following steps for at least one memory page as follows. First, at step S44, the device hashes the memory page and sends, at step S45, the resulting hash value and the location of the memory page in the core dump to the server 320. The location of the memory page in the core dump can be given as at least one of the offset of the memory page in the ELF file, an incremental page identifier or the virtual memory address of the page. At step S46, the server checks if it stores a memory page whose hash value matches the received hash value. If this is not the case, i.e., in case the memory page has not been stored previously by the server, the server requests, at step S47, the page from the device that sends, at step S48, the requested memory page to the server that then stores the received memory page and the hash value, and links them. On the other hand, if the server does store a memory page that corresponds to the received hash value, then the server adds, at step S49, a reference to the existing data in the core dump file that is currently being transferred. In the Figure, steps S47-S49 are dashed to indicate that they are not necessarily performed at every iteration.

Figure 5 is a flowchart illustrating a second, optional part of a method for core dump deduplication according to an embodiment of the present principles. The second part of the method is performed by the server 320 once it has received the entire core dump from the device 310. It is noted that the server can begin the second part even before the entire core dump has been transferred but that it is preferred to wait until it has been transferred.

In the second part, the server analyses the stack trace. Using the program header and the received location of the memory pages, at step S51 , the server translates each return address on the stack into a representative entity so that instance specific addresses are converted to generic addresses. In the preferred embodiment, the representative entity is a segment identifier and a segment offset; in other words, in which segment and at what offset from the beginning. In an alternative method the representative entity is a symbol. This removes any virtual addresses from the stack traces allowing to further deduplicate the stack trace.

To allow this, the server stores two kinds of hash values for each memory page. A first hash value is the hash value received from the device 310 and used during the client-server deduplication (as described with reference to Figure 4).

At step S52, the server generates a second hash value over the memory page stack trace from which the virtual addresses have been removed.

At step S53, the server verifies if it already stores an identical hash value, which has been calculated for another stack trace. If this is not the case, then the server stores the memory page and the second hash value at step S54.

On the other hand, if the server already stores an identical hash value, then it stores, at step S55, a link to the existing memory page with the identical hash value. The server can then delete the received memory page as it merely is a copy of an already stored memory page.

As can be seen, the second hash value is used only for deduplication within the server and is not part of the client-server deduplication. It will be appreciated that the first part of the method can provide important savings on bandwidth and storage and that the second part, if used, can provide further savings on storage.

In a preferred embodiment, the device uses Linux and executes embedded binaries that have been striped and optimized; these binaries do not have debugging symbols.

In the first part of the preferred embodiment, at program crash the Linux kernel pipes the core dump to a program that initiates core dump transfer to the server. This prevents writing the core dump to the disk as is done in for example "The Core Pattern (core_pattern), or how to specify filename and path for core dumps" by aleksander [https://sigquit.wordpress.com/2009/03/13/the-core- pattern/], which is preferred as it is desired not to store the core dump on the device where there may be little or even no remaining disk or memory space. As already described in Figure 4, the device sends to the server the ELF header and the Program header table, which contain the addresses and offsets of the segments. Then, also previously described, the device scans through the memory pages that contain the segments to be included in the core dump file and, for each memory page, calculates a hash value that is sent together with an incremental page identifier to the server. In case the content of the memory page has not been previously stored by the server, the server requests the page from the client that then also sends the content of the memory page.

The skilled person will appreciate that the method respects the conventional scanning order for core dumps, and that it thus can be executed on the fly without requiring storage or copying of parts of the memory. Furthermore, the method can allow reconstruction of the entire ELF core dump on the server.

The deduplication is effective if portions of the heap are the same from one crash to another; even variable values will often have a limited number of possible values. However, the stack and program headers will very likely be different at each run, rendering deduplication ineffective for these portions. In the second part of the preferred embodiment, which is equal to the method described with reference to Figure 5, the server can reconstruct the entire ELF core dump file. The server then analyses the stack trace and and changes return addresses to representative entities (i.e., resolves return addresses to symbols or translates return addresses on the stack into a segment identifier (preferably the hash) and a segment offset).

The actual analysis of the core dumps and how the information in them can be used to improve the quality of the software program is beyond the scope of the present principles. Generally speaking, it is done offline by one or more programmers. The so-called gdb debugger can handle external symbol files using build-id [see "Debugging Information in Separate Files"], thus enabling to debug the core dump on the remote server.

It will thus be appreciated that the present principles can provide a solution for core dump deduplication that can:

· Enable core dump collection for embedded devices without the need for writing the core dump on a disk of the embedded device and without the need for symbols in embedded binaries.

• Save on bandwidth and memory requirements during transfer and storage of core dumps.

· Enable fully exploitable core dumps on the server side.

It should be understood that the elements shown in the figures may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces. Herein, the phrase "coupled" is defined to mean directly connected to or indirectly connected with through one or more intermediate components. Such intermediate components may include both hardware and software based components.

The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure and are included within its scope.

All examples and conditional language recited herein are intended for educational purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.

Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage.

Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.

In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.

ANNEX I

ELF Header:

Magic: 7f 45 4c 46 02 01 0 00 00 00 00 00 00 00 00

Class : ELF64

Data: 2's complement, little endian

Version : 1 (current)

OS/ABI : UNIX - System V

ABI Version: 0

Type : CORE (Core file)

Machine : Advanced Micro Devices X86-64

Version : 0x1

Entry point address: 0x0

Start of program headers: 64 (bytes into file)

Start of section headers: 0 (bytes into file)

Flags : 0x0

Size of this header: 64 (bytes)

Size of program headers: 56 (bytes)

Number of program headers: 20

Size of section headers: 0 (bytes)

Number of section headers: 0

Section header string table : 0

There are no sections in this file.

There are no sections to group in this file.

Program Headers :

Type Offset VirtAddr PhysAddr

FileSiz MemSiz Flags Align

NOTE 0x00000000000004a0 0x0000000000000000 0x0000000000000000

0x0000000000000b98 0x0000000000000000 0

LOAD 0x0000000000002000 0x0000000000400000 0x0000000000000000

0x0000000000001000 OxOOOOOOOOOOOObOOO R E 1000

LOAD 0x0000000000003000 0x000000000060a000 0x0000000000000000

0x0000000000001000 0x0000000000001000 R 1000

LOAD 0x0000000000004000 0x000000000060b000 0x0000000000000000

0x0000000000001000 0x0000000000001000 RW 1000

LOAD 0x0000000000005000 0x0000000000f2f000 0x0000000000000000

0x0000000000021000 0x0000000000021000 RW 1000

LOAD 0x0000000000026000 0x00007f46ff6c6000 0x0000000000000000

0x0000000000000000 0x00000000002c9000 R 1000

LOAD 0x0000000000026000 0x00007f46ff98f000 0x0000000000000000

0x0000000000001000 OxOOOOOOOOOOlbbOOO R E 1000

LOAD 0x0000000000027000 0x00007f46ffb4a000 0x0000000000000000

0x0000000000000000 OxOOOOOOOOOOlffOOO 1000

LOAD 0x0000000000027000 0x00007f46ffd49000 0x0000000000000000

0x0000000000004000 0x0000000000004000 R 1000

LOAD 0x000000000002b000 0x00007f46ffd4d000 0x0000000000000000

0x0000000000002000 0x0000000000002000 RW 1000

LOAD 0x000000000002d000 0x00007f46ffd4f000 0x0000000000000000

0x0000000000005000 0x0000000000005000 RW 1000

LOAD 0x0000000000032000 0x00007f46ffd54000 0x0000000000000000

0x0000000000001000 0x0000000000023000 R E 1000

LOAD 0x0000000000033000 0x00007f46fff49000 0x0000000000000000

0x0000000000003000 0x0000000000003000 RW 1000

LOAD 0x0000000000036000 0x00007f46fff74000 0x0000000000000000

0x0000000000002000 0x0000000000002000 RW 1000

LOAD 0x0000000000038000 0x00007f46fff76000 0x0000000000000000

0x0000000000001000 0x0000000000001000 R 1000

LOAD 0x0000000000039000 0x00007f46fff77000 0x0000000000000000

0x0000000000001000 0x0000000000001000 RW 1000

LOAD 0x000000000003a000 0x00007f46fff78000 0x0000000000000000

0x0000000000001000 0x0000000000001000 RW 1000

LOAD 0x000000000003b000 0x00007ffde5732000 0x0000000000000000 0x0000000000022000 0x0000000000022000 RW 1000

LOAD 0x000000000005d000 0x00007ffde57cd000 0x0000000000000000

0x0000000000002000 0x0000000000002000 R E 1000

LOAD 0x000000000005f000 Oxffffffffff600000 0x0000000000000000

0x0000000000001000 0x0000000000001000 R E 1000

There is no dynamic section in this file.

There are no relocations in this file.

The decoding of unwind sections for machine type Advanced Micro Devices X86- 64 is not currently supported.

Dynamic symbol information is not available for displaying symbols.

No version information found in this file.

Displaying notes found at file offset 0x000004a0 with length 0x00000b98:

Owner Data size

Description

CORE 0x00000150

NT_PRSTATUS (prstatus structure)

CORE 0x00000088

NT_PRPSINFO (prpsinfo structure)

CORE 0x00000080

NT_SIGINFO (siginfo_t data)

CORE 0x00000130

NT_AUXV (auxiliary vector)

CORE 0x00000241

NT_FILE (mapped files)

Page size: 4096

Start End Page Offset

0x0000000000400000 0x000000000040b000 0x0000000000000000

/bin/cat

0x000000000060a000 0x000000000060b000 0x000000000000000a

/bin/cat

0x000000000060b000 0x000000000060c000 0x000000000000000b

/bin/cat

0x00007f46ff6c6000 0x00007f46ff98f000 0x0000000000000000

/usr/lib/locale/ locale-archive

0x00007f46ff98f000 0x00007f46ffb4a000 0x0000000000000000

/lib/x86_64-linux-gnu/libc-2.19. so

0x00007f46ffb4a000 0x00007f46ffd49000 OxOOOOOOOOOOOOOlbb

/lib/x86_64-linux-gnu/libc-2.19. so

0x00007f46ffd49000 0x00007f46ffd4d000 OxOOOOOOOOOOOOOlba

/lib/x86_64-linux-gnu/libc-2.19. so

0x00007f46ffd4d000 0x00007f46ffd4f000 OxOOOOOOOOOOOOOlbe

/lib/x86_64-linux-gnu/libc-2.19. so

0x00007f46ffd54000 0x00007f46ffd77000 0x0000000000000000

/lib/x86_64-linux-gnu/ld-2.19. so

0x00007f46fff76000 0x00007f46fff77000 0x0000000000000022

/lib/x86_64-linux-gnu/ld-2.19. so

CORE 0x00000200

NT_FPREGSET (floating point registers)

LINUX 0x00000340

NT X86 XSTATE (x86 XSAVE extended state)

Claims

1 . A method for deduplication of a core dump file, the method comprising:

scanning (S43), by a hardware processor (31 1 ) of a device (310), a memory (312) of the device (310) for at least one memory page containing segments to be included in a core dump file;

hashing (S44) the at least one memory page to obtain at least one hash value for the at least one memory page;

sending (S45) the at least one hash value for the at least one memory page and at least one location of the at least one memory page in the core dump file; and

upon reception of a request for the at least one memory page, returning (S48) the at least one memory page.

2. The method of claim 1 , wherein the location is given as at least one of an offset of the memory page in a file, an incremental page identifier and a virtual memory address of the memory page.

3. The method of claim 1 , wherein the core dump file is in an Executable and Linking Format (ELF) file format execution view and comprises an ELF header and a Program header table, the method further comprising extracting (S41 ) the ELF header and the Program header table from the core dump file and sending (S42) the extracted ELF header and Program header table.

4. A device (310) for deduplication of a core dump file, the device comprising: memory (312) for storing the core dump file; and

a hardware processor (31 1 ) coupled to the memory and configured to:

scan the memory for at least one memory page containing segments to be included in a core dump file to be sent;

for each memory page to be included in the core dump file to be sent: hash the memory page to obtain a hash value for the memory page; and send the hash value for the memory page and the location of the memory page in the core dump file; and

upon reception of a request for a memory page, return the memory page of the request.

5. The device of claim 4, wherein the location is given as at least one of an offset of the memory page in a file, an incremental page identifier and a virtual memory address of the memory page.

6. The device of claim 4, wherein the core dump file is in an Executable and Linking Format (ELF) file format execution view and comprises an ELF header and a Program header table, and wherein the hardware processor is further configured to extract the ELF header and the Program header table from the core dump file and send the extracted ELF header and Program header table.

7. A method for deduplication of a core dump file comprising:

receiving (S45), by a hardware processor (321 ) of an apparatus (320), a hash value for a memory page of a core dump file and a location of the memory page in the core dump file;

verifying (S46) whether a memory page whose hash value matches the received hash value is stored in memory (322) of the apparatus (320);

in case the memory (322) does not store such a memory page, requesting (S47) the memory page, receiving (S48) the requested memory page in response, and storing the requested memory page and the hash value; and in case the memory (322) stores such a memory page, storing (S49) a reference to the memory page in the memory within the core dump file.

8. The method of claim 7, wherein the core dump file is in an Executable and Linking Format (ELF) file format execution view and comprises an ELF header and a Program header table, the method further comprising receiving (S42) the extracted ELF header and Program header table.

9. The method of claim 8, further comprising:

changing (S51 ), using the Program header table and the location of the memory pages return addresses on a stack trace in the core dump file to representative entities; and

for each memory page for which the return addresses have been translated:

generating (S52) a second hash value;

verifying (S53) whether the second hash value is identical to a hash value calculated for another stack trace and stored in the memory (322); in case no identical hash value is stored, storing (S54) the memory page and the second hash value in the memory (322); and

in case an identical hash value is stored; storing (S55) a link to a memory page corresponding to the stored identical hash value.

10. An apparatus (320) for deduplication of a core dump file, the apparatus (320) comprising:

memory (322) configured to store memory pages of a core dump file and hash values corresponding to the memory pages; and

a hardware processor (321 ) configured to:

receive hash values for memory pages of the core dump file and locations of the memory pages;

for each received hash value, verify whether a memory page whose hash value matches the received hash value is stored in the memory (322);

in case the memory (322) does not store such a memory page, request the memory page, receive the requested memory page, and store the received memory page and the hash value in the memory (322); and

in case the memory (322) stores such a memory page, store a reference to the existing memory page in the core dump file.

1 1 . The apparatus of claim 10, wherein the core dump file is in an Executable and Linking Format (ELF) file format execution view and comprises an ELF header and a Program header table, and wherein the hardware processor is further configured to receive the extracted ELF header and Program header table

12. The apparatus of claim 1 1 , wherein the hardware processor is further configured to:

change, using the Program header table and the location of the memory pages return addresses on a stack trace into representative entities; and

for each memory page for which the return addresses have been translated: generate a second hash value;

verify whether the second hash value is identical to a hash value calculated for another stack trace and stored in the memory (322);

in case no identical hash value is stored, store the memory page and the second hash value in the memory (322); and

in case an identical hash value is stored; store a link to a memory page corresponding to the stored identical hash value.

13. The server of claim 12, wherein a representative entity is a symbol or a segment identifier and a segment offset.

14. Computer program comprising program code instructions executable by processor for implementing the steps of a method according to any one claims 1 -3.

15. Computer program product which is stored on a non-transitory computer readable medium and comprises program code instructions executable by a processor for implementing the steps of a method according to any one of claims 1 -3.