WO2016121024A1

WO2016121024A1 - Communication method, program, and communication device

Info

Publication number: WO2016121024A1
Application number: PCT/JP2015/052295
Authority: WO
Inventors: 真一佐沢; 裕亮亀山
Original assignee: 富士通株式会社
Priority date: 2015-01-28
Filing date: 2015-01-28
Publication date: 2016-08-04

Abstract

A communication method according to an aspect of the present invention comprises the steps of (A) measuring communication speed of a communication network by a computer for performing conversion of de-duplication with respect to partial data obtained by dividing transmission data and transmitting the converted data to the communication network, (B) measuring, by the computer, a total processing time for the conversion, and (C) switching, by the computer, to a system for not performing the conversion in the case where it is determined that a situation holds in which transmission efficiency deteriorates by performing the conversion, as compared to the case where the conversion is not performed, on the basis of the communication speed, a total volume of the partial data, the total processing time, and a total volume of the converted data.

Description

COMMUNICATION METHOD, PROGRAM, AND COMMUNICATION DEVICE

The present invention relates to a technique for eliminating duplication of transmission data.

For example, when data is transmitted in WAN (Wide Area Network) communication between relay devices installed at each base, the substantial communication amount is reduced by replacing an overlapping portion with past transmission data with an index. There is technology. Such data conversion is called deduplication.

If the actual amount of communication is reduced by deduplication, the time required to transmit a large amount of data is expected to be shortened. This technique is applied to various facilities and communication environments.

However, in a situation that is not suitable for deduplication, transmission efficiency is not always improved.

JP 2010-224996 A

An object of the present invention is, in one aspect, to suppress inefficiency due to deduplication depending on the situation.

A communication method according to an aspect includes (A) measuring a communication speed of a communication network by a computer that performs deduplication conversion on partial data divided from transmission data and sends the converted data to the communication network. (B) Measure the total processing time for the conversion, and (C) do not perform the conversion based on the communication speed, the total amount of partial data, the total processing time, and the total amount of converted data. Compared to the case, when it is determined that the transmission efficiency is deteriorated by performing the conversion, a process of switching to a method that does not perform the conversion is included.

As one aspect, inefficiency due to deduplication can be suppressed depending on the situation.

FIG. 1 is a diagram showing an outline of a network. FIG. 2 is a diagram illustrating an example of the passage of time when deduplication is not performed. FIG. 3 is a diagram illustrating an example of the passage of time when deduplication is performed. FIG. 4 is a diagram illustrating an example of a sequence. FIG. 5 is a diagram illustrating an example of a sequence. FIG. 6A is a diagram illustrating an example of a sequence. FIG. 6B is a diagram illustrating an example of a sequence. FIG. 7 is a diagram illustrating an example of a sequence. FIG. 8 is a diagram illustrating an example of a sequence. FIG. 9 is a diagram illustrating an example of a sequence. FIG. 10 is a diagram illustrating a module configuration example of the relay apparatus. FIG. 11 is a diagram illustrating a module configuration example of the first relay unit. FIG. 12 is a diagram illustrating a configuration example of the parameter storage unit. FIG. 13 is a diagram illustrating a module configuration example of the second relay unit. FIG. 14 is a diagram illustrating an example of a measurement processing flow. FIG. 15 is a diagram illustrating an example of a first relay processing flow. FIG. 16 is a diagram illustrating an example of a first subroutine processing flow. FIG. 17 is a diagram illustrating an example of a conversion processing flow. FIG. 18 is a diagram illustrating an example of a third subroutine process (A) flow. FIG. 19 is a diagram illustrating an example of the flow of the first calculation process (A). FIG. 20 is a diagram illustrating an example of the flow of the second calculation process (A). FIG. 21A is a diagram showing an example of a second subroutine processing flow. FIG. 21B is a diagram showing an example of the second subroutine processing flow. FIG. 22 is a diagram illustrating an example of the second relay processing flow. FIG. 23 is a diagram illustrating an example of throughput magnification. FIG. 24 is a diagram illustrating an example of the flow of the first calculation process (B). FIG. 25 is a diagram illustrating an example of the flow of the second calculation process (B). FIG. 26 is a diagram showing an example of a third subroutine process (B) flow. FIG. 27 is a functional block diagram of a computer.

[Embodiment 1]
FIG. 1 shows an outline of the network. In this example, two bases are connected by a WAN. Inside one of the bases, a device (for example, the client terminal 101) is connected by a first LAN (Local Area Network). Inside the other base, a device (for example, the server device 105) is connected by a second LAN.

The performance when performing data communication between bases connected by a WAN in this way often depends on the communication speed in the WAN.

Therefore, a relay device may be provided at the boundary between the WAN and the LAN in order to reduce communication traffic in the WAN. The transmission side relay device compresses the data to be transmitted to reduce the amount of communication data. Alternatively, the relay device on the transmission side reduces the amount of communication data by removing duplicate portions included in the data to be transmitted, that is, omitting the same contents as the already transmitted data. In the present embodiment, attention is paid to deduplication.

In this example, a relay device 109 is provided at the boundary between the first LAN and the WAN. Further, a relay device 111 is provided at the boundary between the second LAN and the WAN. Note that the relay device is sometimes called a middle box.

In this example, it is assumed that application data is sent from the first application program 103 running on the client terminal 101 to the second application program 107 running on the server device 105.

When relaying application data, the relay device 109 (hereinafter referred to as the relay device 109 on the transmission side) provided at the base on the transmission side checks whether there is duplication with data transmitted or received in the past. The data to be replaced is replaced with an index for specifying the storage position in the cache area. Hereinafter, the relay in the transmission-side relay device 109 is referred to as a first relay.

The relay device 111 (hereinafter referred to as the reception-side relay device 111) provided at the reception-side base acquires the data stored in the cache area based on the index sent from the transmission-side relay device 109. To do. Hereinafter, the relay in the receiving-side relay device 111 is referred to as a second relay.

For example, duplication occurs when the same application data that was previously sent from the first application program 103 to the second application program 107 is sent again from the first application program 103 to the second application program 107. In addition, a part of application data sent from the second application program 107 to the first application program 103 in the past is modified in the first application program 103, and the modified application data is changed from the first application program 103 to the second application data. Duplication also occurs when it is sent to the application program 107.

Next, the time required for the first relay that does not perform duplication removal and the time required for the first relay that performs duplication removal will be described.

Fig. 2 shows an example of the passage of time when duplicate removal is not performed. FIG. 2 shows the passage of time downward. The column on the left shows the passage of time in the relay apparatus 109 on the transmission side. The right column shows the passage of time in the receiving side relay device 111.

In addition, the first application program 103 divides application data into n pieces of partial data, and transmits them divided into n times. The partial data is data called a segment, for example. As shown in FIG. 2, the amount of partial data for the first time is W _1a bytes. The amount of the second partial data is W _2a bytes. The amount of the third partial data is W _3a bytes. The amount of the nth partial data is W _na bytes. These partial data are transmitted continuously. In this way, the limitation on the amount of data transmitted continuously is called the window size. The window size is set in the client terminal 101 and the server device 105, and the transmission-side relay device 109 and the reception-side relay device 111 do not hold the window size value.

When sending data from the relay device 109 on the transmission side to the relay device 111 on the reception side, a delay occurs. Therefore, when the delay time L has elapsed from the time when the transmission processing is started in the transmission-side relay device 109, the reception processing is started in the reception-side relay device 111. Further, the time required for the communication of the partial data depends on the amount of the partial data and the communication speed S (denoted as S in FIG. 2) in the WAN. The unit of the communication speed S is bytes per second.

The time required for the first partial data communication is W _1a / S. The time required for the second partial data communication is W _2a / S. The time required for the third partial data communication is W _3a / S. The time required for the n-th partial data communication is W _na / S.

Finally, a response is sent from the receiving-side relay device 111 that has received the nth partial data to the transmitting-side relay device 109, and a delay time L elapses until the response arrives at the transmitting-side relay device 109. .

Therefore, the required time (hereinafter referred to as the first estimated time) in the first relay that does not perform de-duplication is estimated by equation (1).

First estimated time = 2L + W _1a / S + W _2a / S + W _3a / S +... + W _na / S (1)

Here, when the total amount of partial data is represented by W _a , Equation (1) can be rewritten as Equation (2). Note that W _a / S corresponds to the total communication time T _a when deduplication is not performed.

First estimated time = 2L + W _a / S (2)

Fig. 3 shows an example of the passage of time when deduplication is performed. As in FIG. 2, FIG. 3 shows the passage of time downward. Similarly, the left column shows the passage of time in the relay apparatus 109 on the transmission side. Similarly, the right column indicates the passage of time in the receiving-side relay device 111.

Suppose that application data is divided into n partial data under the same conditions as in the example shown in FIG. 2, and is sent out from the client terminal 101 in n times.

Data converted by deduplication is smaller than the original partial data. That is, the amount of data converted at the first time is W _1b bytes, which is smaller than the amount of partial data at the first time W _1a bytes. The amount of data converted for the second time is W _2b bytes, which is smaller than the amount of partial data W _2a bytes for the second time. The amount of data converted for the third time is W _3b bytes, which is smaller than the amount of partial data W _3a bytes for the third time. Then, the amount of converted data to the n th is W _nb bytes, less than the amount W _na byte n-th partial data.

Therefore, the time required for communication of the first converted data is W _1b / S, which is shorter than the time W _1a / S required for communication of the first partial data. The time required for the communication of the second converted data is W _2b / S, which is shorter than the time W _2a / S required for the second partial data communication. The time required for communication of the converted data for the third time is W _3b / S, which is shorter than the time W _3a / S required for communication of the third partial data. The time required for communication of the nth converted data is W _nb / S, which is shorter than the time required for communication of the nth partial data W _na / S.

The time required for the first relay for performing deduplication includes processing time for performing conversion for deduplication on each partial data (hereinafter, this processing time is referred to as conversion time). That is, the total from the conversion time P _{1 for the first} partial data to the conversion time P _n for the n-th partial data is included in the required time in the first relay for performing deduplication. The delay time L is the same as in the case of FIG.

Therefore, the time required for the first relay for performing deduplication (hereinafter referred to as the second estimated time) is estimated by equation (3).

Second estimated time = 2L + W _1b / S + W _2b / S + W _3b / S +... + W _nb / S + P ₁ + P ₂ + P ₃ + ... + P _n (3)

Here, when the total amount of converted data is represented by W _b and the total conversion time for deduplication is represented by P, Equation (3) is rewritten into Equation (4). Note that W _b / S corresponds to the total communication time T _b when deduplication is performed.

Second estimated time = 2L + W _b / S + P (4)

In the present embodiment, the first estimated time and the second estimated time are compared, and control is performed so that the first relay is performed by a method corresponding to the shorter estimated time.

推定 Which of the first estimated time and the second estimated time is shortened, that is, which transmission efficiency is increased depends on the situation. For example, when the communication band is narrow, that is, when communication is slow, the efficiency of transmission is easily increased by performing de-duplication. On the other hand, when the communication band is wide, that is, when communication is fast, the efficiency of transmission is not always improved by performing de-duplication.

Also, when the amount of data reduction by deduplication is large, the efficiency of transmission is likely to increase by performing deduplication. On the other hand, when the amount of data reduction by deduplication is small, the efficiency of transmission tends to decrease rather by performing deduplication.

In addition, when the amount of transmission data at one time is large, that is, when the window size is large, it is easy to increase the transmission efficiency by performing deduplication. On the other hand, when the amount of data transmitted at one time is small, that is, when the window size is small, the efficiency of transmission tends to decrease rather by performing deduplication.

Also, when the conversion process for deduplication is fast, the efficiency of transmission is likely to increase by performing deduplication. On the other hand, if the conversion process for deduplication is slow, performing the deduplication tends to lower the transmission efficiency.

Therefore, more efficient transmission of application data can be realized by performing deduplication according to the situation rather than performing deduplication uniformly.

Next, the sequence in this example will be described. FIG. 4 shows an example of a sequence up to the connection state. The relay device 109 performs initialization processing for WAN connection (S401). The relay apparatus 111 also performs initialization processing for WAN connection (S403). The relay device 109 confirms the WAN connection (S405). The relay apparatus 111 also confirms the WAN connection (S407).

This embodiment assumes a connection-type transmission control protocol such as TCP (Transmission Control Protocol). The first application program 103 transmits a connection request to the second application program 107 in a state where the WAN connection is confirmed between the relay device 109 and the relay device 111 (S409). The connection request is transferred by the relay device 109 (S411), and further transferred by the relay device 111 (S413).

When receiving the connection request (S415), the second application program 107 returns a response to the connection request (S417). The response is transferred in the relay device 111 (S419), and further transferred in the relay device 109 (S421).

When the first application program 103 receives the response (S423), the connection state is established. Thereafter, application data is transmitted in the connection state.

Fig. 5 shows an example of a sequence for transmitting application data. As described above, the application data is divided into n pieces of partial data, and is transmitted in n times. When the first application program 103 transmits the first partial data (S501), the transmission-side relay device 109 performs the first first relay (S503), and the reception-side relay device 111 performs the first time. The second relay is performed (S505). Then, the second application program 107 receives the first partial data (S507).

Subsequently, when the first application program 103 transmits the second partial data (S511), the transmission-side relay device 109 performs the second first relay (S513), and the reception-side relay device 111 The second relay is performed for the second time (S515). Then, the second application program 107 receives the second partial data (S517).

Subsequently, when the first application program 103 transmits the partial data for the third time (S521), the relay device 109 on the transmission side performs the first relay for the third time (S523), and the relay device 111 on the reception side The second relay is performed for the third time (S525). Then, the second application program 107 receives the third partial data (S527).

Finally, when the first application program 103 transmits the n-th partial data (S531), the transmission-side relay device 109 performs the n-th first relay (S533), and the reception-side relay device 111 The n-th second relay is performed (S535). Then, the second application program 107 receives the nth partial data (S537).

When the second partial data is received, the second application program 107 transmits a response (S541). The reception-side relay device 111 transfers the response (S543), and the transmission-side relay device 109 also transfers the response (S545). Then, the first application program 103 receives a response (S547). This completes the transmission of application data.

The sequence shown in FIG. 5 is common to the case where deduplication is performed and the case where deduplication is not performed.

Subsequently, the first relay and the second relay in the case of performing deduplication will be described. FIG. 6A shows an example of a sequence in the first relay that performs deduplication. When receiving the partial data sent from the first application program 103 (S601), the transmission-side relay device 109 specifies the partial data amount (S603). Then, the transmission-side relay device 109 starts time measurement (S605) and performs conversion for deduplication (S607). When the conversion for deduplication is completed, the transmission-side relay device 109 ends the time measurement (S609). In this way, the transmission-side relay device 109 measures the conversion time for deduplication. Next, the transmission-side relay device 109 calculates a data reduction amount due to de-duplication (S611). In this way, the partial data amount obtained in the first relay, the conversion time for removing duplicates, and the data reduction amount are used in the determination of the situation described later.

Then, the transmission-side relay device 109 transmits the converted data to the reception-side relay device 111 (S613). Note that the processing from S601 to S613 will be described in detail later.

FIG. 6B shows an example of a sequence in the second relay when deduplication is performed. The receiving-side relay device 111 receives the converted data (S615), and executes a restoration process on the converted data (S617). The receiving-side relay device 111 transmits the restored partial data to the second application program 107 (S619). The processing from S615 to S619 will also be described in detail later.

Fig. 7 shows an example of a sequence related to situation determination and switching when deduplication is performed. The relay apparatus 109 on the transmission side switches between a method for performing deduplication and a method for not performing deduplication using the control flag. In this example, when the control flag stored in the transmission-side relay device 109 is set to ON, the transmission-side relay device 109 performs the first relay described above. That is, if the control flag is ON, it means that a method for performing deduplication is selected. When the transmission of application data is completed when the control flag is set to ON (S701), the situation is determined (S703).

In the situation determination, the relay apparatus 109 on the transmission side calculates the first estimated time based on the equation (2), and further calculates the second estimated time based on the equation (4). Then, it is determined which is shorter between the first estimated time and the second estimated time. If the first estimated time is shorter, it means that it is better not to perform deduplication. If the second estimation time is shorter, it means that it is better to perform deduplication.

If it is determined that the second estimated time is shorter, the control flag remains ON. Then, the next application data is similarly transmitted (S705), and the determination is similarly performed (S707).

In this way, transmission and determination of application data are repeated. While it is determined that the second estimated time is shorter, the control flag is not switched. That is, deduplication is continuously performed.

As described above, when application data is transmitted in a situation where the transmission efficiency tends to be lowered by performing deduplication (S709), it is determined in S711 that the first estimated time is shorter. As a result, the control flag is switched from ON to OFF (S713). Then, the first relay is performed in a manner that does not perform deduplication.

In this example, the situation is determined every time application data is transmitted, but the situation may be determined at the timing when the application data is transmitted a plurality of times.

Subsequently, the first relay and the second relay when no duplicate removal is performed will be described. FIG. 8 shows an example of a sequence in the first relay and the second relay when deduplication is not performed. When receiving the partial data sent from the first application program 103 (S801), the transmission-side relay device 109 transmits the received partial data to the reception-side relay device 111 (S803).

When receiving the partial data sent from the transmission-side relay device 109 (S805), the reception-side relay device 111 transmits the received partial data to the second application program 107 (S807).

FIG. 9 shows an example of a sequence relating to situation determination and switching when no deduplication is performed. As described above, even when the transmission of application data is completed when the control flag is set to OFF, the data for determining the situation cannot be obtained, so the situation is not determined. However, the situation is checked at a predetermined timing. That is, as in the case where the control flag is set to ON, deduplication is performed in the first relay, and the situation is further determined. Then, switching is performed in the same manner as when the control flag is set to ON.

In the example of FIG. 9, in the case of transmission of application data shown in S901 and S903, deduplication is not performed in the first relay. On the other hand, in the transmission of application data shown in S905, deduplication is performed in the first relay as in the case where the control flag is set to ON in order to check the situation. In step S907, the situation is determined. If it is determined in S907 that the second estimated time is shorter, the control flag is not switched.

Thereafter, even in the case of transmission of application data shown in S909 and S911, deduplication is not performed in the first relay. In the transmission of application data shown in S913, deduplication is performed in the first relay to check the situation in the same way as in S905. If it is determined in S915 that the first estimated time is shorter, the control flag is switched from OFF to ON in S917. Then, the first relay is performed from the next by the method of performing deduplication.

7 and 9, switching is performed based on a single determination result, but switching may be performed based on a plurality of determination results. For example, switching may be performed when the same determination result is obtained a plurality of times.

Next, the configuration and operation of the relay device will be described. Note that the relay device 109 and the relay device 111 illustrated in FIG. 1 have the same configuration and operate in the same manner. That is, when application data is transmitted from the second application program 107 to the first application program 103, the relay device 111 performs the first relay as the transmission side, and the relay device 109 performs the second relay as the reception side.

Fig. 10 shows a module configuration example of the relay device. The relay device 109 includes an initialization unit 1001, a confirmation unit 1003, a connection unit 1005, a first relay unit 1007, a second relay unit 1009, and a cache data storage unit 1011.

The initialization unit 1001 performs initialization for WAN connection (for example, S401 shown in FIG. 4). The confirmation unit 1003 confirms WAN connection (for example, S405 illustrated in FIG. 4). The connection unit 1005 performs processing for establishing a connection (for example, S411 and S421 illustrated in FIG. 4).

The first relay unit 1007 performs relay as a transmission side, that is, first relay. The first relay unit 1007 will be described in detail later with reference to FIG. The second relay unit 1009 performs relay as a reception side, that is, second relay. The second relay unit 1009 will be described in detail later with reference to FIG. The cache data storage unit 1011 has a cache area for storing data to be saved. The cache data storage unit 1011 also has an area for storing a hash table.

FIG. 11 shows a module configuration example of the first relay unit 1007. The first relay unit 1007 includes a control unit 1101, a first reception unit 1103, a first transmission unit 1105, a conversion unit 1107, a first measurement unit 1109, a second measurement unit 1111, a third measurement unit 1113, and a first summation unit 1115. , A second summation unit 1117, a first calculation unit 1119, a second calculation unit 1121, a switching unit 1123, a flag storage unit 1125, and a parameter storage unit 1127.

The control unit 1101 controls processing in the first relay unit 1007. The first receiving unit 1103 receives, for example, partial data sent from a transmission source application program. Also, the first receiving unit 1103 receives a response from the receiving-side application program from the receiving-side relay device. The first transmission unit 1105 transmits, for example, converted data or partial data to the relay device on the reception side.

The conversion unit 1107 performs conversion processing for deduplication. The first measurement unit 1109 measures the time required for the conversion process for removing duplicates, that is, the conversion time. The second measuring unit 1111 measures a delay time L in communication performed with the partner relay device. The third measuring unit 1113 measures the communication speed S (bytes per second) in communication with the partner relay device.

The first summation unit 1115 obtains the total amount of continuous partial data. The second summation unit 1117 obtains the total amount of data reduction by conversion processing for deduplication. The 1st calculation part 1119 calculates 1st estimation time by Formula (2). Hereinafter, the calculation by the first calculation unit 1119 is referred to as a first calculation. The 2nd calculation part 1121 calculates 2nd estimation time by Formula (4). Hereinafter, the calculation by the second calculation unit 1121 is referred to as a second calculation. The switching unit 1123 compares the first estimated time and the second estimated time, and switches the control flag based on the comparison result.

The flag storage unit 1125 stores a control flag. The parameter storage unit 1127 stores various parameters.

FIG. 12 shows a configuration example of the parameter storage unit 1127. The parameter storage unit 1127 includes _a parameter 1201 for the total data amount Wa, _a parameter 1203 for the total reduction amount R, a parameter 1205 for the total conversion time P, a parameter 1207 for the delay time L, a parameter 1209 for the communication speed S, The parameter 1211 of 1 estimation time and the parameter 1213 of 2nd estimation time are memorize | stored.

FIG. 13 shows a module configuration example of the second relay unit 1009. The second relay unit 1009 includes a second reception unit 1301, a restoration unit 1303, and a second transmission unit 1305. The second receiving unit 1301 receives the converted data or partial data from the transmission-side relay device. The restoration unit 1303 restores partial data from the converted data. The second transmission unit 1305 transmits the restored partial data or the received partial data to the destination module.

Initialization unit 1001, confirmation unit 1003, connection unit 1005, first relay unit 1007, second relay unit 1009, control unit 1101, first reception unit 1103, first transmission unit 1105, conversion unit 1107, first measurement Unit 1109, second measurement unit 1111, third measurement unit 1113, first summation unit 1115, second summation unit 1117, first calculation unit 1119, second calculation unit 1121, switching unit 1123, second reception unit 1301, restoration The unit 1303 and the second transmission unit 1305 are realized by using hardware resources (for example, FIG. 27) and a program that causes a processor to execute processing described below.

The cache data storage unit 1011, the flag storage unit 1125, and the parameter storage unit 1127 described above are realized using hardware resources (for example, FIG. 27).

The first relay unit 1007 periodically performs a process of measuring the delay time L and the communication speed S. FIG. 14 shows an example of the measurement processing flow. The 2nd measurement part 1111 measures the response time in the round-trip communication performed between the other party's relay apparatuses. Then, the second measurement unit 1111 obtains a delay time L of communication via the WAN by dividing the measured response time by 2 (S1401). The method for measuring the delay time L may be a conventional technique. The second measurement unit 1111 may execute a command “PING (Packet （Internet Groper)”, for example. The measured delay time L is set in the parameter 1207 of the delay time L.

The third measuring unit 1113 measures the communication speed S (bytes per second) in communication with the partner relay device (S1403). That is, the communication speed S corresponds to a communication band that is allowed to be used in the WAN. The method for measuring the communication speed S may be a conventional technique. The third measuring unit 1113 may use, for example, a tool “Iperf”. The measured communication speed S is set in the parameter 1209 of the communication speed S.

The control unit 1101 determines whether or not a predetermined time has elapsed (S1405). The predetermined time corresponds to an interval for measuring. If it is determined that the predetermined time has not elapsed, the control unit 1101 repeats the process of S1405. If it is determined that the predetermined time has elapsed, the process returns to S1401, and the above-described process is repeated.

In this example, the measurement process is in parallel with the first relay process shown in FIG. However, the process of S1401 and the process of S1403 may be included in the routine of the first relay process.

Next, the first relay process by the first relay unit 1007 will be described. Here, for convenience of explanation, a first relay process in the transmission-side relay device 109 is assumed. FIG. 15 shows an example of the first relay processing flow. The control unit 1101 initializes parameters (S1501). Specifically, the parameter 1201 of the total W _a partial amount of data, the parameters 1205 of the total P parameters 1203 and conversion time of a total R of reductions, respectively "0" is set.

The first receiving unit 1103 waits and receives the first partial data from the client terminal 101 (S1503). Then, the control unit 1101 determines whether or not the control flag is ON (S1505). If it is determined that the control flag is ON, the first relay unit 1007 executes a first subroutine process (S1507), and returns to the process of S1501. The first subroutine processing corresponds to a method for performing deduplication. On the other hand, if it is determined that the control flag is not ON, that is, if the control flag is OFF, the first relay unit 1007 executes the second subroutine process (S1509) and returns to the process of S1501. The second subroutine process corresponds to a method that does not perform duplicate removal. Hereinafter, the first subroutine process and the second subroutine process will be described in detail in order.

FIG. 16 shows an example of the first subroutine processing flow. First summing unit 1115 specifies the amount of partial data in question, adds the partial data amount specified in the parameter 1201 of the total W _a partial data amount (S1601). Initially, the partial data received in S1503 in FIG. 15 is the target, and the second and subsequent partial data determined to be received in S1617 are the target.

The first measurement unit 1109 starts measuring the required time in the conversion process shown in S1605 (S1603). Then, the conversion unit 1107 executes a conversion process (S1605). In the conversion process, conversion for removing duplicates is performed on the target partial data.

FIG. 17 shows an example of the conversion process flow. First, the conversion unit 1107 divides the target partial data into chunks (S1701). The method of dividing into chunks is according to the prior art.

The conversion unit 1107 identifies one chunk among the divided chunks (S1703). The conversion unit 1107 identifies chunks in order from the top of the partial data, for example.

The conversion unit 1107 calculates a hash value for the identified chunk (S1705). The conversion unit 1107 may calculate a hash value using a conventional hash function (for example, SHA1). The hash value calculated at this time is an ID for identifying the original chunk, and is used as an index for reading the chunk stored in the cache area. Accordingly, the hash value calculated at this time is hereinafter referred to as an index. The hash value is sometimes called a message digest. The conversion unit 1107 searches the hash table using the index as a key (S1707).

When the chunk to be processed is already stored in the cache area, the index exists in the hash table. On the other hand, when the chunk to be processed is not stored in the cache area, the index does not exist in the hash table.

The conversion unit 1107 determines whether or not the index exists in the hash table (S1709). If it is determined that the index does not exist in the hash table, the conversion unit 1107 stores the chunk to be processed in the cache area (S1711). At this time, the conversion unit 1107 adds the index to the hash table (S1713). Then, the conversion unit 1107 adds the chunk to be processed to the converted data (S1715). The converted data is generated by sequentially adding chunks or indexes. Then, the process proceeds to S1719.

On the other hand, when it is determined that the index exists in the hash table, the conversion unit 1107 adds the index to the converted data (S1717). Then, the process proceeds to S1719.

The conversion unit 1107 determines whether there is an unprocessed chunk (S1719). If it is determined that there is an unprocessed chunk, the process returns to S1703 and the above-described process is repeated.

On the other hand, if it is determined that there is no unprocessed chunk, the conversion process ends, and the process proceeds to S1607 shown in FIG.

Returning to the explanation of FIG. The first measurement unit 1109 ends the measurement of the required time in the conversion process described above (S1607). Then, the first measuring unit 1109 adds the measured conversion time to the value indicated by the parameter 1205 of the total conversion time P (S1609).

The second summation unit 1117 calculates the data reduction amount by the conversion process (S1611). Specifically, the amount of data reduction is obtained by subtracting the amount of converted data from the amount of partial data. The second summation unit 1117 adds the calculated data reduction amount to the value indicated by the parameter 1203 of the total reduction amount R (S1613).

The first transmission unit 1105 transmits the converted data to the relay device 111 on the receiving side (S1615).

The first receiving unit 1103 determines whether or not the next partial data has been received from the client terminal 101 (S1617). If it is determined that the next partial data has been received from the client terminal 101, the process returns to S1601, and the above-described process is repeated.

On the other hand, if it is determined that the next partial data is not received from the client terminal 101, the first receiving unit 1103 determines whether a response is received from the relay device 111 on the receiving side (S1619). If it is determined that no response has been received from the receiving-side relay device 111, the process returns to S1617 and the above-described process is repeated.

On the other hand, if it is determined that a response has been received from the receiving-side relay device 111, the first relay unit 1007 executes a third subroutine process (S1621). In the third subroutine processing, processing corresponding to the determination (S703, S707, and S711) and switching (S713) shown in FIG. 7 is performed.

In the present embodiment, the third subroutine process (A) is performed. FIG. 18 shows an example of the third subroutine processing (A) flow. The first calculation unit 1119 executes a first calculation process (S1801). In the present embodiment, the first calculation process (A) is performed. In the first calculation process (A), the time required for estimation in the first relay that does not perform deduplication, that is, the first estimated time, is calculated according to Equation (2).

FIG. 19 shows an example of the flow of the first calculation process (A). The first calculator 1119 sets a value obtained by doubling the value indicated by the parameter 1207 of the delay time L in the parameter 1211 of the first estimated time (S1901). The first calculation unit 1119, by dividing the value indicated by the parameter 1209 parts total data amount of W _a value the communication speed S represented by the parameter 1201, a total of T _a communication time of the case of not performing deduplication Calculate (S1903). The first calculation unit 1119 adds the sum T _a communication time of the case of not performing de-duplication of the value indicated by the parameter 1211 of the first estimated time (S1905). When the first calculation process (A) is completed, the process proceeds to S1803 shown in FIG.

Returning to the explanation of FIG. The second calculation unit 1121 executes a second calculation process (S1803). In the present embodiment, the second calculation process (A) is performed. In the second calculation process (A), the required time for estimation in the first relay that performs deduplication, that is, the second estimated time, is calculated by Expression (4).

FIG. 20 shows an example of the flow of the second calculation process (A). The second calculator 1121 sets a value obtained by doubling the value indicated by the parameter 1207 of the delay time L in the parameter 1213 of the second estimated time (S2001). The second calculation unit 1121 adds the value indicated by the parameter 1205 of the total conversion time P to the value indicated by the parameter 1213 of the second estimated time (S2003). The second calculation unit 1121 calculates the total communication time T _b when performing deduplication (S2005). The second calculating unit 1121, for example, by subtracting the value indicated by the parameter 1203 of the total R in the reductions from the total W value indicated by the parameter 1201 of _a partial amount of data to determine the total amount of converted data. The total amount of converted data corresponds to the remaining amount of data. The second calculation unit 1121 further obtains a total communication time T _b when performing deduplication by dividing the total amount of the converted data by the value indicated by the parameter 1209 of the communication speed S.

The second calculation unit 1121 may calculate the amount of converted data by another procedure. The second calculation unit 1121 may total the amount of converted data, for example.

The second calculation unit 1121 adds the total communication time T _b when performing deduplication to the value indicated by the parameter 1213 of the second estimated time (S2007). When the second calculation process (A) is completed, the process proceeds to S1805 shown in FIG.

Returning to the explanation of FIG. The switching unit 1123 compares the value indicated by the parameter 1211 of the first estimated time with the value indicated by the parameter 1213 of the second estimated time. In this example, the switching unit 1123 determines whether or not the first estimated time is equal to or shorter than the second estimated time (S1805). When it is determined that the first estimated time is equal to or shorter than the second estimated time, the switching unit 1123 sets the control flag to OFF (S1807). If the control flag has already been set to OFF before the processing of S1807, a method that does not perform de-duplication is continued. On the other hand, if the control flag is set to ON before the processing of S1807, the method for performing deduplication is switched to the method for not performing deduplication.

On the other hand, when it is determined that the first estimated time is not shorter than the second estimated time, that is, when the first estimated time is longer than the second estimated time, the switching unit 1123 sets the control flag to ON (S1809). . If the control flag has already been set to ON before the processing of S1809, the method for performing deduplication is continued. On the other hand, if the control flag is set to OFF before the processing of S1809, the method that does not perform deduplication is switched to the method that performs deduplication.

When the third subroutine process is finished, the first subroutine process shown in FIG. 16 is also finished, and the process returns to the process of S1501 shown in FIG. Note that the processing of S1809 may be performed when the first estimated time and the second estimated time are equal.

Subsequently, the second subroutine process will be described. FIG. 21A shows an example of the second subroutine processing flow. The control unit 1101 determines whether or not it corresponds to the timing for checking the situation (S2101). The timing for checking the situation is, for example, when a predetermined time or more has elapsed since the previous check. Alternatively, the timing for checking the situation may be when the number of transmissions of application data from the previous check exceeds a predetermined value. Alternatively, the timing for checking the situation may be determined by a predetermined period. Or you may make it determine the timing which checks a condition by another method.

If it is determined that the timing does not correspond to the timing for checking the situation, the first transmission unit 1105 transmits the partial data as it is to the relay device 111 on the reception side (S2103).

The first receiving unit 1103 determines whether or not the next partial data has been received from the client terminal 101 (S2105). If it is determined that the next partial data has been received from the client terminal 101, the process returns to S2103 and the above-described process is repeated.

On the other hand, if it is determined that the next partial data is not received from the client terminal 101, the first receiving unit 1103 determines whether a response is received from the relay device 111 on the receiving side (S2107). If it is determined that a response has not been received from the receiving-side relay device 111, the process returns to S2105 and the above-described process is repeated. And when it determines with having received the response from the relay apparatus 111 of the receiving side, a 2nd subroutine process is complete | finished.

On the other hand, if it is determined in S2101 that it corresponds to the timing for checking the situation, the processing from S2109 to S2123 is performed as in the case of S1601 to S1615 shown in FIG. Then, the processing proceeds to the process of S2125 illustrated in FIG.

The processing from S2125 to S2129 is the same as that from S1617 to S1621 shown in FIG. If it is determined in S2125 that the next partial data has been received from the client terminal 101, the process returns to S2109 shown in FIG. The third subroutine process is the same as that executed in the first subroutine process (S1621 in FIG. 16). When the third subroutine processing in S2129 is finished, the second subroutine processing is also finished, and the processing returns to S1501 shown in FIG.

Next, the second relay process by the second relay unit 1009 will be described. Here, for convenience of explanation, a second relay process in the receiving-side relay device 111 is assumed. FIG. 22 shows an example of the second relay processing flow. The second receiving unit 1301 waits and receives the converted data or partial data from the transmission-side relay device 109 (S2201). The restoration unit 1303 executes a restoration process on the converted data (S2203). The restoration process is based on a conventional method. The chunk included in the received data is added to the restored data as it is. For the index included in the received data, the hash table is searched using the index as a key. Then, data corresponding to the original chunk is acquired from the hash area, and the acquired data is added to the restored data. The restored data is generated by sequentially adding chunks. In this way, when the restoration process is executed on the converted data, the original partial data is obtained. If it is determined that the partial data has been received, the restoration process may not be performed.

The second transmission unit 1305 transmits the partial data restored in this way to the destination module (in the example shown in FIG. 6B, the second application program 107) (S2205). This is the end of the description of the configuration and operation of the relay device.

Finally, an application example of this embodiment is shown. FIG. 23 shows an example of the throughput magnification. The bar indicated by the frame indicates the throughput magnification when this embodiment is adopted. The throughput magnification when this embodiment is adopted is a value obtained by dividing the throughput when this embodiment is adopted, that is, the throughput when performing deduplication depending on the situation by the throughput when not performing deduplication. It is.

The filled bar indicates the throughput magnification when the conventional technology is adopted. The throughput magnification in the case of employing the conventional technique is a value obtained by dividing the throughput in the case of employing the conventional technique, that is, the throughput in the case of always performing deduplication, by the throughput in the case of not performing deduplication.

FIG. 23 shows two throughput magnifications under five communication conditions. In a relatively low speed state, that is, when the communication speed is 1 Mbps and the communication speed is 10 Mbps, there is no difference between the two throughput magnifications. In either case, high efficiency is achieved by performing de-duplication.

And when the communication speed is 20 Mbps, both of the two throughput magnifications are almost one. When the communication speed is 20 Mbps, it is indicated that there is no difference in the transmission efficiency of application data between the method of performing deduplication and the method of not performing deduplication.

On the other hand, in a relatively high speed state, that is, when the communication speed is 50 Mbps and when the communication speed is 100 Mbps, the throughput magnification in the conventional method is 1 or less. This is because the transmission efficiency of the method that performs deduplication is lower than the transmission efficiency of the method that does not perform deduplication.

On the other hand, in the case of the present embodiment, since the method is switched to a method that does not perform de-duplication, the throughput magnification is almost 1. As described above, even when the method for performing deduplication is disadvantageous, the transmission performance of application data does not deteriorate.

According to this embodiment, inefficiency due to deduplication can be suppressed depending on the situation. That is, it is possible to avoid a situation in which transmission efficiency is deteriorated by performing deduplication.

Also, since the communication time is calculated by dividing the amount of data by the communication speed, it is not necessary to measure the communication time.

[Embodiment 2]
In the present embodiment, an example of comparing estimated times excluding the delay time L will be described.

Both the calculation formula for the first estimation time shown in the formula (2) and the calculation formula for the second estimation time shown in the formula (4) include a 2L term. Therefore, in the case of the first embodiment, even if 2L is not added to the estimated time in both the first calculation process for calculating the first estimated time and the third calculation process for calculating the second estimated time, A determination result equivalent to is obtained.

That is, both the first estimated time and the second estimated time in the present embodiment are estimated values of the required time of the first relay excluding the delay time. The first estimated time in the present embodiment is obtained by the following equation.

First estimated time = W _a / S Equation (5)

The second estimated time in the present embodiment is obtained by the following equation.

Second estimated time = W _b / S + P Equation (6)

In the present embodiment, in the first calculation process (S1801) in the third subroutine process shown in FIG. 18, instead of the first calculation process (A) shown in the first embodiment, the first calculation process (B) Execute. Further, in S1803 of the third subroutine process shown in FIG. 18, the second calculation process (B) is executed instead of the second calculation process (A) shown in the first embodiment.

FIG. 24 shows an example of the first calculation process (B) flow. The first calculation unit 1119 sets “0” to the parameter 1211 of the first estimation time (S2401). The processes in S1903 and S1905 are the same as in the first calculation process (A).

FIG. 25 shows an example of the flow of the second calculation process (B). The second calculator 1121 sets “0” to the parameter 1213 of the second estimated time (S2501). The processes from S2003 to S2007 are the same as those in the second calculation process (A).

Then, the measurement process of the delay time L in S1401 of FIG. 14 may be omitted. Therefore, the second measurement unit 1111 may be omitted.

Note that the comparison formula in the determination processing shown in S1805 of FIG. 18 is represented by the following formula in the case of the present embodiment.

W _a / S ≦ W _b / S + P (7)

According to the present embodiment, the process of measuring the delay time L can be omitted.

[Embodiment 3]
In the present embodiment, an example will be described in which the situation is determined using the data reduction rate c by deduplication or the data remaining rate k (= 1-c) after deduplication.

If the reduction rate c is used, the total converted data amount W _b can be obtained by the following equation.

W _b = (1−c) W _a (8)

Applying equation (8) to equation (7) yields the following equation:

P ≧ cW _a / S (9)

Therefore, equation (9) is equivalent to equation (7). In the present embodiment, the situation is determined according to Equation (9).

In the present embodiment, in the third subroutine processing (S1621) in the first subroutine processing shown in FIG. 16, the third subroutine processing (B) is substituted for the third subroutine processing (A) shown in the first embodiment. Execute. Further, in the third subroutine processing (S2129) in the second subroutine processing shown in FIG. 21B, the third subroutine processing (B) is executed instead of the third subroutine processing (A) shown in the first embodiment.

FIG. 26 shows an example of the third subroutine processing (B) flow. Switching unit 1123, the value indicated by the parameter 1203 of the total R in the reduction, by dividing the total W value indicated by the parameter 1201 of _a partial amount of data to calculate the reduction rate c of the data by de-duplication (S2601 ). If it is assumed that the data reduction rate c does not fluctuate, the processing shown in S2601 may be omitted and the already calculated data reduction rate c may be used.

Switching unit 1123, similarly to the processing in S1903 shown in FIG. 19, by dividing the total W value indicated by the parameter 1209 Parameter 1201 speed communication a value indicated by the S of _a partial amount of data, perform deduplication If not, the total communication time T _a is calculated (S2603).

Switching unit 1123 determines the sum P of the conversion time for the overlap removal, to or greater than the product of the sum T _a communication time of the case without the reduction rate c and de-duplication of data (S2605 ). Total P conversion time for the overlap removal, processing when it is determined to be equal to or greater than the product of the sum T _a communication time of the case without the reduction rate c and deduplication data are shown in FIG. 18 This is the same as the processing in S1807. Total P conversion time for the duplication removal processing when it is determined that not more than the product of the sum T _a communication time of the case without the reduction rate c and deduplication data are shown in FIG. 18 S1809 This is the same as the process. The product of the sum T _a communication time of the case without the reduction factor c and de-duplication data corresponds to the shortening of the total communication time by the conversion.

Note that the data remaining rate k after deduplication may be used instead of the data reduction rate c by deduplication. The following equation is equivalent to equation (9).

P ≧ (1-k) W _a / S (10)

The switching unit 1123 may calculate the data remaining rate k instead of the data reduction rate c. In this case, the switching unit 1123, for example, the difference obtained by subtracting the value indicated by the parameter 1203 of the total R in the reductions from the total W value indicated by the parameter 1201 of _a partial amount of data of a total of W _a partial data amount parameter 1201 By dividing by the value indicated by, the data remaining rate k is calculated.

In addition, the switching unit 1123 makes a determination based on Expression (10) instead of Expression (9) in S2605. In addition, when it is assumed that the data remaining rate k does not fluctuate, the data remaining rate k that has already been calculated may be used.

According to the present embodiment, when it is assumed that the data reduction rate c does not fluctuate, the processing amount of the reduction rate c can be reduced. Similarly, when it is assumed that the data remaining rate k does not fluctuate, the processing amount of the remaining rate k can be reduced.

Although the embodiment of the present invention has been described above, the present invention is not limited to this. For example, the functional block configuration described above may not match the program module configuration.

Further, the configuration of each storage area described above is an example, and the configuration as described above is not necessarily required. Further, in the processing flow, if the processing result does not change, the processing order may be changed or a plurality of processes may be executed in parallel.

Note that the relay device described above is a computer device. As shown in FIG. 27, the relay device includes a memory 2501, a CPU (Central Processing Unit) 2503, a hard disk drive (HDD: Hard Disk Drive) 2505, and a display device 2509. A display control unit 2507 to be connected, a drive device 2513 for the removable disk 2511, an input device 2515, and a communication control unit 2517 for connecting to a network are connected by a bus 2519. An operating system (OS: Operating System) and an application program for executing the processing in this embodiment are stored in the HDD 2505, and are read from the HDD 2505 to the memory 2501 when executed by the CPU 2503. The CPU 2503 controls the display control unit 2507, the communication control unit 2517, and the drive device 2513 according to the processing content of the application program to perform a predetermined operation. Further, data in the middle of processing is mainly stored in the memory 2501, but may be stored in the HDD 2505. In the embodiment of the present invention, an application program for executing the above-described processing is stored in a computer-readable removable disk 2511 and distributed, and installed in the HDD 2505 from the drive device 2513. In some cases, the HDD 2505 may be installed via a network such as the Internet and the communication control unit 2517. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU 2503 and the memory 2501 described above and programs such as the OS and application programs.

The embodiments of the present invention described above are summarized as follows.

In the communication method according to the present embodiment, (A) the communication speed of the communication network is obtained by a computer that performs deduplication conversion on the partial data divided from the transmission data and sends the converted data to the communication network. (B) measure the total processing time of the conversion, and (C) convert the conversion based on the communication speed, the total amount of partial data, the total processing time, and the total amount of converted data. When it is determined that the transmission efficiency is deteriorated by performing the conversion as compared with the case where the conversion is not performed, a process of switching to a method in which the conversion is not performed is included.

In this way, inefficiency due to duplicate removal can be suppressed depending on the situation. That is, it is possible to avoid a situation in which transmission efficiency is deteriorated by performing deduplication.

Furthermore, the first communication time may be calculated by dividing the total amount of partial data by the communication speed. The second communication time may be calculated by dividing the total amount of the converted data by the communication speed. In the switching process, it is determined whether the above situation is satisfied based on a comparison result between the first estimated time including the first communication time and the second estimated time including the second communication time and the total of the processing times. You may do it.

In this way, since the communication time is obtained by dividing the total amount of data by the communication speed, it is not necessary to measure the time required for sending the data to be transmitted and the time required for sending the converted data.

Further, in the switching process, based on the data reduction rate or remaining rate by the conversion, a total reduction amount of communication time by the conversion is calculated, and based on a comparison result between the total processing time and the reduction amount. It may be determined whether the above situation is applicable.

In this way, when it is assumed that the data reduction rate and the remaining rate do not fluctuate, the processing amount of the reduction rate or the remaining rate can be reduced.

A program for causing a computer to perform the processing according to the above method can be created. The program can be a computer-readable storage medium such as a flexible disk, a CD-ROM, a magneto-optical disk, a semiconductor memory, or a hard disk. It may be stored in a storage device. Note that intermediate processing results are generally temporarily stored in a storage device such as a main memory.

Claims

A computer that performs deduplication conversion on the partial data divided from the transmission data, and sends the converted data to the communication network,
Measure the communication speed of the communication network,
Measure the total processing time of the conversion,
Based on the communication speed, the total amount of the partial data, the total of the processing time, and the total amount of the converted data, transmission efficiency is deteriorated by performing the conversion as compared with the case where the conversion is not performed. A communication method for executing a process of switching to a method that does not perform the conversion when it is determined that the situation is applicable.
Furthermore,
The first communication time is calculated by dividing the total amount of the partial data by the communication speed,
Dividing the total amount of the converted data by the communication speed to calculate a second communication time;
In the switching process,
The determination is made based on a comparison result between a first estimated time including the first communication time and a second estimated time including a sum of the second communication time and the processing time. The communication method described.
In the switching process,
Based on the data reduction rate or remaining rate due to the conversion, the total reduction amount of communication time due to the conversion is calculated, and based on the comparison result between the total processing time and the reduction amount, it corresponds to the situation The communication method according to claim 1.
Performs deduplication conversion on the partial data divided from the transmission data, measures the communication speed of the communication network to a computer that sends the converted data to the communication network,
Measure the total processing time of the conversion,
Based on the communication speed, the total amount of the partial data, the total of the processing time, and the total amount of the converted data, transmission efficiency is deteriorated by performing the conversion as compared with the case where the conversion is not performed. A program for executing switching to a method that does not perform the conversion when it is determined that the situation is applicable.
A communication device that performs deduplication conversion on partial data divided from transmission data and sends the converted data to a communication network,
A first measuring unit for measuring a communication speed of the communication network;
A second measuring unit for measuring the total processing time of the conversion;
Based on the communication speed, the total amount of the partial data, the total of the processing time, and the total amount of the converted data, transmission efficiency is deteriorated by performing the conversion as compared with the case where the conversion is not performed. And a switching unit that switches to a method that does not perform the conversion when it is determined that the situation is satisfied.