WO2015155881A1

WO2015155881A1 - Information processing system, which assists with selection of test case, and control method for said information processing system

Info

Publication number: WO2015155881A1
Application number: PCT/JP2014/060484
Authority: WO
Inventors: 陽介加賀; 長野　裕史
Original assignee: 株式会社日立製作所
Priority date: 2014-04-11
Filing date: 2014-04-11
Publication date: 2015-10-15

Abstract

[Problem] To assist with the selection of a test case for the purpose of efficiently and reliably performing a software test. [Solution] An information processing system (1) that assists with the selection of a test case: generates an N-dimensional frequency distribution which is a distribution of frequencies of appearance with regard to a combination of values that are based on data to be input into software to be tested and can be taken on by N factors; and estimates the frequency of appearance of a prescribed data pattern on the basis of the generated N-dimensional frequency distribution. The information processing system (1) outputs a frequency list (821) in which the estimated frequency of appearance with regard to each of a plurality of data patterns is disclosed and in which frequency of appearance cumulative values that have cumulated from the highest frequency of appearance are entered alongside. Furthermore, the information processing system (1) outputs: an excluded data pattern table (822) which is information representing data patterns in which the estimated frequency of appearance exceeds a prescribed value; and a data pattern frequency table (823) which is information representing data patterns in which the estimated frequency of appearance is equal to or less than the prescribed value.

Description

Information processing system for supporting test case selection and control method thereof

The present invention relates to an information processing system that supports selection of a test case and a control method thereof.

In an information processing system that outputs a large number of data stored in a database as an input, the number of combinations of data that may be input becomes enormous. Therefore, when performing software testing on such an information processing system, it is necessary to appropriately narrow down the test cases and perform the test efficiently.

In this regard, for example, Patent Document 1 discloses that any number of parameters out of a total of M parameters that can be input to a program in order to generate program test data with a small amount of data while ensuring necessary diversity. It is described that test data covering all patterns for a combination of (M ≧ N> 1) parameters is generated.

JP 2006-227958 A

In Patent Document 1, test data is generated so as to cover all combinations of arbitrary N parameters. For this reason, for example, if most of the defects included in the software are generated by a combination of N or less parameters, most of the defects can be detected with a small number of generated test cases.

However, when many of the defects included in the software occur with combinations of more than N parameters, the defects cannot be detected sufficiently. In addition, the number of defects that can be detected can be increased by increasing N. However, the number of test cases increases, the load on the test increases, and the time required for the test also increases.

In addition, since the method of Patent Document 1 is intended to cover any combination of N parameters, it is possible to quantitatively determine how much of the data pattern that can actually occur can be verified by a test. It is difficult to grasp. In addition, if you can access the data that is actually used when generating the test case, you can figure out which test case occupies how often, but it is actually used In many cases, confidential information such as personal information is included in the data, and such data cannot often be used effectively.

An object of the present invention is to support the selection of test cases for efficiently and surely testing software, and to improve the work efficiency of software development and repair.

One aspect of the present invention for achieving the above object is an information processing system that supports selection of a test case, which is a combination of values that can be taken by N elements based on data input to software to be tested. An N-dimensional frequency distribution, which is a distribution of the appearance frequency, is generated, and an appearance frequency of a predetermined data pattern is estimated based on the N-dimensional frequency distribution.

The other problems disclosed in the present application and the solutions thereof will be clarified by the description of the mode for carrying out the invention and the drawings.

According to the present invention, it is possible to support selection of a test case for efficiently and surely testing software, and to improve work efficiency related to software development and repair.

1 is a diagram illustrating a schematic configuration of a test case selection support system 1. FIG. This is a hardware configuration example of the information processing apparatus 200 that can be used as the frequency distribution generation apparatus 110 and the frequency estimation apparatus 120. It is a flowchart explaining N-dimensional frequency distribution production | generation process S300. It is an example of the data specification table 211. 4 is an example of an N-dimensional frequency distribution table 221. 4 is an example of a business DB table 212. It is a flowchart explaining N-dimensional frequency distribution reflection process S305. It is a flowchart explaining frequency estimation processing S800. It is an example of an input data pattern table 811. It is an example of a frequency list 821. 7 is an example of an exclusion data pattern table 822; 4 is an example of a data pattern frequency table 823. It is a flowchart explaining data pattern frequency estimation process S804. It is a flowchart explaining data pattern frequency output processing S807. It is a flowchart explaining data pattern frequency estimation process S1500. It is an example of a tree structure 1600 that the frequency estimation device 120 refers to when searching for a data pattern.

Hereinafter, embodiments will be described in detail with reference to the drawings.

= First embodiment =
FIG. 1 shows a schematic configuration of an information processing system (hereinafter referred to as a test case selection support system 1) described as the first embodiment. In the test case selection support system 1, for example, the business system 100 used in actual business is modified, and the modified software (including programs and data) or the newly added software is tested (acceptance). It is used when implementing.

The test case selection support system 1 generates information serving as a determination criterion when a user selects a test case using data registered in a database of the business system 100 (hereinafter also referred to as a business DB 101). . Specifically, the test case selection support system 1 performs statistical processing on data stored in the business DB 101, and distribution of appearance frequencies (hereinafter, referred to as N elements) of combinations of N elements (hereinafter also referred to as N elements). (Also referred to as an N-dimensional frequency distribution), and information serving as a criterion for selecting a test case is generated using the generated N-dimensional frequency distribution.

Here, the N-dimensional frequency distribution does not include highly confidential information such as personal information, and the amount of data is greatly reduced as compared with the data stored in the business DB 101. Therefore, the N-dimensional frequency distribution can be taken out from the operation site of the business system 100 and used at the software development site, and the test can be performed safely and efficiently. Moreover, since the test case selection support system 1 generates information as a criterion for selecting a test case using data registered in the business DB 101 used for actual business, data that can be actually obtained You can increase the coverage for the pattern and perform high-quality tests.

1, the test case selection support system 1 includes a frequency distribution generation device 110 and a frequency estimation device 120. These are all realized by using one or more information processing apparatuses (computers). In the business system 100, a business DB 101 that is a database for managing data used in actual business is operating. In the business system 100, various software (not shown) that performs information processing using the business DB 101 functions. For example, data is exchanged between the business system 100 and the frequency distribution generation device 110 and between the frequency distribution generation device 110 and the frequency estimation device 120 via a communication unit or a recording medium. The business system 100, the frequency distribution generation device 110, and the frequency estimation device 120 may be realized by independent information processing devices, or any two or more may be realized by the same information processing device.

The frequency distribution generation device 110 reads data (or a data set) from the business DB 101 and generates an N-dimensional frequency distribution based on the read data. In addition, the frequency estimation device 120 reads the N-dimensional frequency distribution generated by the frequency distribution generation device 110 and estimates the appearance frequency for an arbitrary data pattern.

FIG. 2 is an example of an information processing device (computer) that implements the frequency distribution generation device 110 and the frequency estimation device 120. As shown in the figure, the information processing apparatus 200 includes a processor 201, a main storage device 202, an auxiliary storage device 203, an input device 204, a display device 205, and a communication device 206. These are communicably connected via communication means such as a bus (not shown).

The processor 201 is configured using, for example, a CPU (Central Processing Unit) and an MPU (Micro Processing Unit). Various functions of the information processing apparatus 200 are realized by the processor 201 reading and executing a program stored in the main storage device 202.

The main storage device 202 is a device that stores programs and data, and is, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), an NVRAM (Non Volatile RAM), or the like. The auxiliary storage device 203 is a hard disk drive, an SSD (Solid State Drive), an optical storage device, or the like. Programs and data stored in the auxiliary storage device 203 are loaded into the main storage device 202 as needed.

The input device 204 is a user interface that receives input of information and instructions from the user, and is, for example, a keyboard, a mouse, or a touch panel. The output device 205 is a user interface that provides information to the user, and is, for example, a graphic card, a liquid crystal monitor, an LCD (Liquid Crystal Display), or the like. The communication device 206 is a communication interface that communicates with other devices via a communication network, and is, for example, a NIC (Network Interface Card).

As shown in FIG. 1, the frequency distribution generation device 110 includes functions of a data specification reading unit 111, a business DB reading unit 112, a frequency distribution generation unit 113, a frequency distribution output unit 114, and a frequency distribution DB 115. These functions are realized by the processor 201 reading and executing a program stored in the main storage device 202.

The data specification reading unit 111 reads the contents of the data specification table 211. Details of the data specification table 211 will be described later. The business DB reading unit 112 reads data from the business DB 101 while referring to the data specifications read by the data specification reading unit 111. The frequency distribution generation unit 113 performs statistical processing on the data read by the business DB reading unit 112 and generates an N-dimensional frequency distribution. The frequency distribution output unit 114 outputs the N-dimensional frequency distribution generated by the frequency distribution generation unit 113 to the frequency distribution DB 115. The frequency distribution DB 115 is a database (hereinafter also referred to as “DB”) managed by a DBMS (DataBase Management System). The frequency distribution DB 115 stores the N-dimensional frequency distribution output by the frequency distribution output unit 114 and provides the frequency estimation apparatus 120 with the contents of the N-dimensional frequency distribution.

As shown in FIG. 1, the frequency estimation device 120 includes functions of a data pattern reading unit 121, a frequency distribution reading unit 122, a frequency estimation unit 123, a data pattern search unit 124, and a data pattern frequency output unit 125. These functions are realized by the processor 201 reading and executing a program stored in the main storage device 202.

The data pattern reading unit 121 reads a data pattern that is a target of appearance frequency estimation from the input data pattern table 811. The frequency distribution reading unit 122 reads the N-dimensional frequency distribution generated by the frequency distribution generation device 110 and stored in the frequency distribution DB 115. The frequency estimation unit 123 estimates the appearance frequency of the data pattern read by the data pattern reading unit 121 based on the N-dimensional frequency distribution read by the frequency distribution reading unit 122. The data pattern search unit 124 performs a depth-first search for data patterns that can be input to the software to be tested, estimates the appearance frequency of each data pattern acquired in the search process based on the N-dimensional frequency distribution, and Efficient data pattern extraction. The data pattern frequency output unit 125 extracts a data pattern having a high appearance frequency based on the appearance frequency of the data pattern estimated by the frequency estimation unit 123, and outputs the extracted data pattern.

Next, processing performed in the test case selection support system 1 having the above configuration will be described in detail.

<N-dimensional frequency distribution generation processing>
FIG. 3 is a flowchart for explaining processing performed by the frequency distribution generation device 110 (hereinafter also referred to as N-dimensional frequency distribution generation processing S300). The frequency distribution generation device 110 performs N-dimensional frequency distribution generation processing S300 to select N elements (hereinafter, also referred to as N elements) from data composed of M elements in the business DB 101. An N-dimensional frequency distribution is generated using the selected N elements. In the following, it is assumed that the dimension N of the frequency distribution is in the range of 0 <N ≦ M. In addition, the N-dimensional frequency distribution is generated for all combinations in which N elements are extracted from M elements.

As shown in the figure, the frequency distribution generation device 110 first reads the contents of the data specification table 211 (S301).

FIG. 4 shows an example of the data specification table 211. As shown in the figure, the data specification table 211 defines the specifications of data registered in the business DB 101. The data specification table 211 is set in advance by a user or the like, for example.

Data item ID (hereinafter also referred to as DID 401) is an identifier assigned to each data item of data registered in the business DB 101. The data item name 402 is the name of the data item included in the business DB 101. For example, a character string such as “name”, “date of birth”, “subscription period”, and “average payment amount” is set. The

The frequency distribution application presence / absence 403 is a flag indicating whether or not to generate a frequency distribution for the data item. “1” is set when the frequency distribution is generated, and “0” is set when the frequency distribution is not generated. In the example of FIG. 4, since the character string “name” is not a target for generating a frequency part, “0” is set in the frequency distribution application 403 and other data items are targets for appearance frequency. In both cases, “1” is set in the frequency distribution application 403.

Boundary value 404 is a list of boundary values that divides the range of values that each data item can take. In this example, a value used in the equivalence analysis method used in the field of software testing is set as the boundary value. In the equivalence analysis method, values that change the behavior of software are listed as boundary values, and representative values are extracted from sections divided by the boundary values and used for testing. This makes it possible to verify the behavior of the software with a minimum amount of tests. For example, in FIG. 4, “0”, “12”,..., “480” in which the behavior of software changes as the “subscription period” is set as the boundary value. The boundary value is set by a user or the like based on software specifications or source code, for example. If it is difficult to set the boundary value from the software specifications or source code, set the boundary value mechanically, for example, set 10 values at equal intervals in the direction from the minimum value to the maximum value. Also good.

3, in S302, the frequency distribution generation apparatus 110 initially generates the N-dimensional frequency distribution table 221. Specifically, the frequency distribution generation apparatus 110 obtains a combination of N elements for a data item whose frequency distribution application presence / absence 403 is “1”, and generates an N-dimensional frequency distribution table 221 corresponding to the obtained combination. For example, when there are 10 data items whose frequency distribution application presence / absence 403 is “1” and N = 2, the frequency distribution generation device 110 obtains 45 (= 10C2) combinations, and N dimensions corresponding to the combinations. A frequency distribution table 221 is generated.

FIG. 5 shows an example of the N-dimensional frequency distribution table 221. The figure shows only the N-dimensional frequency distribution table for a specific combination. In the figure, a first element 501 indicates a section number of the first element z of N elements (an identifier uniquely given to each section divided by boundary values). In this example, the section number is a number that defines a section with the boundary value 404 defined in the data specification table 211 as a boundary and is assigned to the section in order from the smallest. For example, if the boundary value 404 is “0”, “10”,..., The z section number when z <0 is “1”, and the z section number when 0 <z ≦ 10 is “1”. 2 ”. Similarly, the Nth element 502 indicates the section number of the Nth element among the N elements. In the frequency 503, the number of data belonging to the corresponding section of the first element 501 to the Nth element 502 among the data registered in the business DB 101 is set.

Returning to FIG. 3, the frequency distribution generation apparatus 110 then substitutes 1 for a variable i (S 303), and from the table of the business DB 101 (hereinafter referred to as the business DB table 212), the record with the ID i (hereinafter referred to as the business DB). (Also referred to as DB data) is read (S304).

FIG. 6 shows an example of the business DB table 212. As shown in the figure, this business DB table 212 is composed of a plurality of records having items such as ID 601, name 602, date of birth 603, subscription period 604, and average payment amount 605. The ID 601 is an identifier (hereinafter also referred to as a record ID) assigned to each record of the business DB 101.

3, the frequency distribution generation device 110 then reflects the contents of the obtained i-th business DB data in the N-dimensional frequency distribution table 221 which is a temporary (temporary) table (S305). The details of this process (hereinafter also referred to as N-dimensional frequency distribution reflection process S305) will be described later.

Subsequently, the frequency distribution generation device 110 adds 1 to the variable i (S306), and determines whether or not the variable i exceeds the number of data in the business DB 101 (or a preset number of repetitions) (S307). If the variable i exceeds the number of data in the business DB 101 (S307: Yes), the process proceeds to S308. If the variable i does not exceed the number of data in the business DB 101 (S307: No), the process returns to S304.

In S308, the frequency distribution generation device 110 outputs the contents of the temporary N-dimensional frequency distribution table 221 to the N-dimensional frequency distribution table 221.

FIG. 7 is a flowchart for explaining the details of the N-dimensional frequency distribution reflecting process S305 in FIG. The N-dimensional frequency distribution reflecting process S305 will be described below with reference to FIG. For the sake of simplicity of explanation, a case where N = 2 is described below as an example. In the case of N> 2, it is necessary to introduce a new variable in addition to the existing variable (m, n) and to have a process having N-fold loops.

As shown in the figure, first, the frequency distribution generation device 110 substitutes “1” for the variable m (S701). This variable m corresponds to the DID 401 of the first element described above.

Subsequently, the frequency distribution generation device 110 refers to the data in the data specification table 211, reads the value of the frequency distribution application presence / absence 403 of the data whose DID 401 is m, and determines whether or not the read value is “1”. (S702). When the read value is “1”, the process proceeds to S703, and when the read value is “0”, the process proceeds to S709.

In S703, the frequency distribution generation device 110 substitutes “m + 1” for the variable n. This variable n corresponds to the DID 401 of the second element. The reason for “m + 1” is that the object to be considered is a combination of two variables m and n, and it is sufficient to consider only when m <n.

Subsequently, the frequency distribution generation device 110 refers to the data specification table 211, reads the value of the frequency distribution application presence / absence 403 of the column whose DID 401 is n, and determines whether or not it is “1” (S704). . When the value of the frequency distribution application presence / absence 403 is “1” (S704: Yes), the process proceeds to S705. When the value of the frequency distribution application presence / absence 403 is “0” (S704: No), the process proceeds to S707.

Subsequently, the frequency distribution generation device 110 extracts a value with DID 401 of m from the business DB data with ID 601 (record ID) of i of the business DB table 212 and compares the extracted value with the data specification table 211. Is assigned to the variable s. In addition, the frequency distribution generation apparatus 110 extracts a value in which the DID 401 is n from the business DB data in which the ID 601 is i on the business DB table 212, and compares the extracted value with the data specification table 211 to obtain a corresponding section number. Assign to variable t.

Subsequently, the frequency distribution generation device 110 updates the columns corresponding to the first element DID = m and the second element DID = n in the temporary N-dimensional frequency distribution table 221 (S706). Specifically, the frequency distribution generation device 110 determines whether or not there is a column in the N-dimensional frequency distribution table 221 with the first element section number 501 = s and the second element section number 502 = t. If a column exists, 1 is added to the frequency of the corresponding column. If the column does not exist, a new column is added to the temporary N-dimensional frequency distribution table 221, and m is set as the first element section number 501 and n is set as the second element section number 502 to the added column. 1 is set as 503.

Subsequently, the frequency distribution generation device 110 adds 1 to the variable n (S707), and determines whether the variable n exceeds the total number of data items in the business DB table 212 (S708). When the variable n exceeds the total number of data items in the business DB table 212 (S708: Yes), the processing from S709 is performed. If the variable n does not exceed the total number of data items in the business DB table 212 (S708: No), the processing from S704 is performed.

Subsequently, the frequency distribution generation device 110 adds 1 to the variable m (S709), and determines whether the variable m exceeds the total number of data items in the business DB table 212 (S710). If the variable m exceeds the total number of data items (S710: Yes), the process ends. When the variable m does not exceed the total number of data items (S710: No), the processing from S702 is performed.

Through the processing described above, the data stored in the business DB 101 is converted into an N-dimensional frequency distribution.

<Frequency estimation processing>
Next, processing (hereinafter also referred to as frequency estimation processing S800) in which the frequency estimation device 120 estimates the appearance frequency of the data pattern based on the N-dimensional frequency distribution generated as described above will be described.

FIG. 8 is a flowchart for explaining the frequency estimation process S800. The frequency estimation apparatus 120 estimates the appearance frequency for each data pattern by performing the frequency estimation process S800 for a preset data pattern using the N-dimensional frequency distribution table 221 generated by the frequency distribution generation apparatus 110. .

As shown in the figure, first, the frequency estimation device 120 reads an input data pattern table 811 which is a table in which preset data patterns are registered (S801).

FIG. 9 shows an example of the input data pattern table 811. In the figure, a pattern ID 901 is an identifier (hereinafter also referred to as a pattern ID) that is uniquely assigned for each data pattern. Reference numerals 902 to 904 denote the DID 401 of the data specification table 211 corresponding to each data item. A numerical value described at a position where the pattern ID 901 and any of the reference numerals 902 to 904 intersect is a section number. In this example, only the data specification table 211 whose frequency distribution application presence / absence 403 is “1” is targeted. The frequency estimation device 120 estimates the appearance frequency for each of the data patterns stored in the input data pattern table 811.

Referring back to FIG. 8, the frequency estimation apparatus 120 then substitutes 1 for the variable i (S802), and reads the i-th data pattern from the input data pattern table 811.

Subsequently, the frequency estimation device 120 estimates the appearance frequency for the i-th data pattern and outputs the frequency list 821 (S804). The frequency estimation device 120 estimates the appearance frequency based on the N-dimensional frequency distribution table 221 generated by the frequency distribution generation device 110. Details of this processing (hereinafter also referred to as data pattern frequency estimation processing S804) will be described later.

FIG. 10 shows an example of the frequency list 821. As shown in the figure, the frequency list 821 includes one or more records having three items: a pattern ID 1001, a frequency estimated value 1002, and a frequency upper limit value 1003. A pattern ID 1001 is a pattern ID in the input data pattern table 811. The frequency estimation value 1002 is a value (probability) indicating how much the data pattern is included in the business DB 101. The frequency upper limit value 1003 is an upper limit value (probability) of the appearance frequency of the data pattern in the business DB 101.

Returning to FIG. 8, the frequency estimation apparatus 120 adds 1 to the variable i (S805), and determines whether or not the variable i exceeds the total number of patterns stored in the input data pattern table 811. (S806). If the variable i exceeds the total number of patterns (S806: Yes), the process proceeds to S807. If the variable i does not exceed the total number of patterns (S806: No), the process returns to S803.

In S807, the frequency estimation apparatus 120 outputs an excluded data pattern table 822 and a data pattern frequency table 823. Details of this process (hereinafter also referred to as data pattern frequency output process S807) will be described later.

FIG. 11 shows an example of the excluded data pattern table 822 output by the frequency estimation device 120 in S807. The excluded data pattern table 822 is a table that stores data patterns determined to have an estimated appearance frequency equal to or less than a preset threshold value T%. As shown in the figure, the excluded data pattern table 822 is composed of a plurality of records each having a pattern ID 1101, a frequency estimated value 1102, a frequency upper limit value 1103, and section numbers (reference numerals 1104 to 1106) of each data item. Has been. Information stored in the excluded data pattern table 822 is the same as the information in the input data pattern table 811 and the frequency list 821.

FIG. 12 shows an example of the data pattern frequency table 823 output by the frequency estimation device 120 in S807. The data pattern frequency table 823 includes a pattern frequency upper limit 1204 that exceeds a threshold value T%. As shown in the figure, the data pattern frequency table 823 has a plurality of items each including a pattern ID 1201, a frequency estimated value 1202, a cumulative frequency 1223, a frequency upper limit value 1203, and a section number (reference numerals 1205 to 1207) of each data item. It is composed of records. Information other than the cumulative frequency 1203 is the same as the information in the input data pattern table 811 and the frequency list 821. In the data pattern frequency table 823, each data pattern is sorted in descending order of the frequency estimated value 1202, and the accumulated frequency 1203 of a certain data pattern has a cumulative value obtained by summing up the frequency estimated values of data patterns higher than the data pattern. Is stored.

When performing a software test on the business system 100, for example, the data patterns stored in the data pattern frequency table 823 are employed as test cases in descending order of appearance frequency. As a result, it is possible to improve the efficiency of the test process while increasing the coverage for the data patterns that can actually be taken in the business system 100, and to improve the quality of the business system 100. In addition, by referring to the accumulated frequency 1203, the user or the like grasps to what extent the entire data pattern included in the business DB 101 can be covered when a test is performed up to a certain data pattern. be able to. When the selected data pattern is used as a test case, it is necessary to perform processing such as replacing the section number with a representative value of the section and then inputting it to the system. Specifically, for example, the boundary value in the data specification table 211 is referred to, and the minimum value, maximum value, median value, and the like of the section indicated by the section number are used as representative values.

Next, details of the data pattern frequency estimation process S804 of FIG. 8 will be described with reference to the flowchart of FIG.

First, the frequency estimation device 120 assigns 1 to a variable m indicating the DID 401 of the first element and a variable j indicating the number of updates of the appearance frequency estimation value (S1301).

Subsequently, the frequency estimation apparatus 120 substitutes m + 1 for a variable n indicating the DID 401 of the second element (S1302). The reason why “m + 1” is substituted for the variable n is that the object to be considered is a combination of two variables m and n, and it is sufficient to consider only when m <n.

Subsequently, in the input data table 811, the frequency estimation apparatus 120 substitutes the section number in which the DID 401 of the data pattern whose pattern ID 1001 is equal to the variable i is m in the variable B (m). Similarly, the section number in which the DID 401 of the data pattern whose pattern ID 1001 is equal to the variable i is n is substituted into the variable B (n) (S1303).

Subsequently, the frequency estimation apparatus 120 reads the frequency 503 of the data pattern corresponding to the section number of the first element B (m) and the section number of the second element B (n) from the N-dimensional frequency distribution table 221 ( Hereinafter, this is expressed as a variable Fm, n (B (m), B (n)).) (S1304).

Subsequently, the frequency estimation apparatus 120 updates the estimated value of the appearance frequency of the i-th data pattern (S1305). Here, the frequency estimation device 120 uses the following equation based on the estimated value E (j−1) in the j−1th update and the variables Fm, n (B (m), B (n)) read in S1304. An estimated value E (j) of the appearance frequency is obtained.
E (j) = ((j−1) × E (j−1) + e (j)) / j
Here, e (j) is a provisional frequency estimation value obtained for the jth time, and E (j) is an average value of e (1) to e (j). e (j) is a tentative estimate obtained when it is assumed that B (m) and B (n) are correlated and the other data items are independent of each other. Is done.
e (j) = Pall * Fm, n (B (m), B (n)) / (Fm (B (m)) * Fn (B (n))) * S
Here, S is the total number of data included in the business DB table 212, and matches the sum of the frequencies 503 in the N-dimensional frequency distribution table 221. Fm (B (m)) is the number of appearances of the section number B (m) for the data item m, and is expressed by the following equation for an arbitrary DIDn.
Fm (B (m)) = Fmn (B (m), 1) +... + Fmn (B (m), M)
Pall is the frequency of the i-th data pattern obtained when it is assumed that the section numbers of all data items are independent from each other, and is expressed by the following equation.
Pall = (F1 (B (1)) / S) × (F2 (B (2)) / S) ×... × (FM (B (M)) / S)

Subsequently, the frequency estimation apparatus 120 updates the upper limit value U (j) of the frequency of the i-th data pattern (S1306). The upper limit value U (j) is based on U (j-1) and the variable Fm, n (B (m), B (n)) read in S1304 as the upper limit value in the j-1st update. Ask from.
U (j) = MIN (U (j-1), Fm, n (B (m), B (n)) / S)

Here, MIN () is a function that returns the minimum value of a given argument, and therefore the upper limit value U (j) has Fm, n (B (m), B (n)) / for all m and n. The minimum value of S is stored. Fm, n (B (m), B (n)) / S represents the appearance frequency of the data pattern when only the data items m and n are limited. Therefore, the frequency of appearance of the specified data pattern for data items other than m and n is Fm, n (B (m), B (n)) / S or less. For this reason, even if the minimum value of Fm, n (B (m), B (n)) / S for all m, n is U (j), the actual data pattern appearance frequency is U (j) or less. It is guaranteed.

A specific example is shown. For example, data items include “age”, “pension enrollment period”, and “average monthly income”, and the appearance frequency for a data pattern combining two of these items is “age = 30 years old, pension enrollment period = 10 years” ”Is estimated to be 10%,“ age = 30 years old, average monthly income = 300,000 yen ”0.5%, and“ annual membership period = 10 years, average monthly income = 300,000 yen ”1.0% The upper limit of the appearance frequency of the data pattern combining the three items “age = 30 years, annual subscription period = 10 years, average monthly income = 300,000 yen” is the minimum value of 0.5% It becomes.

Subsequently, the frequency estimation apparatus 120 adds 1 to the variable n, adds 1 to the variable j (S1307), and determines whether the variable n exceeds the total number M of elements (S1308). If the variable n exceeds M (S1308: Yes), the process proceeds to S1309. If the variable n is M or less (S1308: No), the process returns to S1303.

Subsequently, the frequency estimation apparatus 120 adds 1 to the variable m (S1309), and determines whether or not the variable m exceeds M−1 (S1310). If the variable m exceeds M−1 (S1309: Yes), the process proceeds to S1311. If the variable m is equal to or less than M−1 (S1309: No), the process returns to S1302.

In S1311, the frequency estimation apparatus 120 outputs the frequency estimation value E (j) and the frequency upper limit U (j) updated repeatedly in S1305 and S1306 to the frequency list 821. Specifically, the frequency estimation apparatus 120 writes the pattern ID 1001 of the data pattern and the corresponding E (j) as the frequency estimation value 1002 and U (j) as the frequency upper limit value 1003, respectively. The data pattern frequency estimation process S804 in FIG. 8 is performed as described above.

Next, details of the data pattern frequency output process S807 of FIG. 8 will be described with reference to the flowchart of FIG.

As shown in the figure, the frequency estimation apparatus 120 first substitutes 1 for a variable i (S1401), and reads a data pattern whose pattern ID matches i from the frequency list 821 (S1402).

Subsequently, the frequency estimation device 120 compares the appearance frequency of the read data pattern (hereinafter referred to as the data pattern) with a preset appearance frequency threshold T (0% ≦ T <100%) (S1403). ). When the appearance frequency of the data pattern exceeds the threshold T (S1403: Yes), the frequency estimation apparatus 120 adds the data pattern to the data pattern frequency table 823 (S1405). At this time, the frequency estimation device 120 acquires the frequency estimated value 1002 and the frequency upper limit value 1003 of the data pattern from the frequency list 821 and sets them in the data pattern frequency table 823. Further, the frequency estimation apparatus 120 acquires the section numbers 902 to 904 of the records with the pattern ID i from the input data pattern table 811 and sets them in the data pattern frequency table 823. Thereafter, the process proceeds to S1406.

On the other hand, when the appearance frequency of the data pattern is equal to or less than the threshold T (S1403: No), the frequency estimation device 120 adds the data pattern to the excluded data pattern table 822. At this time, the frequency estimation apparatus 120 acquires the frequency estimated value 1002 and the frequency upper limit value 1003 of the data pattern from the frequency list 821 and sets them in the excluded data pattern table 822. Further, the frequency estimation apparatus 120 acquires the section numbers 902 to 904 of the records with the pattern ID i from the input data pattern table 811 and sets them in the excluded data pattern table 822. Thereafter, the process proceeds to S1406.

In S1406, the frequency estimation apparatus 120 adds 1 to the variable i (S1406), and determines whether the variable i exceeds the total number of data patterns included in the frequency list 821 (S1407). If the variable i exceeds the total number of data patterns (S1407: Yes), the process proceeds to S1408. If the variable i is equal to or less than the total number of data patterns (S1407: No), the process returns to S1402.

In S1408, the frequency estimation device 120 sorts the data patterns included in the data pattern frequency table 823 in descending order of the frequency estimated value 1002. In S1409, the frequency estimation apparatus 120 calculates the sum of the frequency estimation values 1202 from the data pattern having the largest frequency estimation value to each data pattern, and substitutes the calculated value into the cumulative frequency 1203 of each data pattern. The data pattern frequency output process S807 of FIG. 8 is performed as described above.

= Second Embodiment =
In the second embodiment, instead of inputting a manually set data pattern as in the first embodiment, a depth-first search is performed for all data patterns that can be input to the test target software. The appearance frequency of each acquired data pattern is estimated based on the N-dimensional frequency distribution, and data patterns with the appearance frequency up to the top X are output. However, since the number of possible data patterns is enormous, it is difficult to estimate the appearance frequency for all patterns. Therefore, in the present embodiment, in the process of searching for the data pattern for estimating the frequency, the data pattern with low frequency is pruned, and the top X data patterns are efficiently extracted.

FIG. 15 is a flowchart for explaining the data pattern frequency estimation process S1500 shown as the second embodiment.

As shown in the figure, the frequency estimation apparatus 120 first substitutes 1 as an initial value for a variable i (S1501). Subsequently, the frequency estimation device 120 estimates the appearance frequency of the i-th data pattern (S1502). This process is performed by the same method as the data pattern frequency estimation process S804 of FIG. 13 of the first embodiment. The frequency estimation device 120 calculates a frequency estimation value 1002 and a frequency upper limit value 1003 for the i-th data pattern, and stores them in the frequency list 821 in association with the pattern ID.

Subsequently, the frequency estimation device 120 searches for a data pattern for estimating the appearance frequency (S1503), and substitutes the pattern ID of the searched data pattern for the variable i (S1504). In this embodiment, since the input data pattern table 811 is not used, the frequency estimation device 120 automatically generates a unique pattern ID for each data pattern to be searched.

FIG. 16 shows an example of a tree structure 1600 that the frequency estimation device 120 refers to when searching for a data pattern. Here, as an example, it is assumed that there are three data items of “birth date”, “subscription period”, and “average payment amount”.

First, the frequency estimation device 120 generates a node for each section number of the first data item “birth date”, and connects it to a location directly under the route 1600. Here, a node 1601 and a node 1607 are added. Subsequently, the frequency estimation device 120 generates

nodes

1602 and 1605 for each section number of “subscription period” which is the next data item, and connects them under the node 1601. Hereafter, the tree is connected downward (to the previous branch) while increasing the number of elements in the same manner.

In the data pattern search, the frequency estimation device 120 searches the tree structure by a depth-first search. That is, when the search is started from a certain node, the child node of the node is preferentially searched, and after searching all the deeper nodes, the search returns to the parent node and the search is continued. In the example of FIG. 16, the frequency estimation device 120 searches the tree structure in the order of a node 1601, a node 1602, a node 1603, a node 1604,..., A node 1605, a node 1606,.

However, if such a search is performed on all data patterns, the amount of calculation becomes enormous and the processing load increases. Therefore, in the present embodiment, the search efficiency is improved by not searching for nodes that are assumed not to be in the top X from the estimated frequency of appearance of each node.

Here, since the data pattern corresponding to the child node is a subset of the data pattern corresponding to the parent node, the frequency estimate value and the frequency upper limit value of the child node are smaller than the frequency estimate value and the frequency upper limit value of the parent node. It is guaranteed. Therefore, using this property, out of the data patterns whose appearance frequency is estimated during the search process, the estimated frequency of the data pattern whose estimated frequency of occurrence is the Xth and the appearance frequency of the parent node to be searched The estimated value is compared, and if the estimated value of the appearance frequency of the parent node is smaller, the child node is not searched (canceled).

15, after the search for the data pattern is completed (S1505: YES), the frequency estimation device 120 outputs the appearance frequency of the data pattern (S1506). This process is performed by the same method as the data pattern frequency estimation process S807 of FIG. 14 of the first embodiment. In addition, after sorting the data pattern frequency table 823 by the frequency estimation values in S1408 of FIG. 14, the appearance frequencies of all the obtained data patterns are not output, but only the X items in descending order of appearance frequency are output. You may make it output.

As described above, a list of data patterns with the highest appearance frequency from the data included in the business DB table 212 is output. Accordingly, the user or the like can know a data pattern having a high appearance frequency without preparing the input data pattern table 811 in advance as in the first embodiment, and can efficiently perform a software test.

By the way, the present invention is not limited to the above-described embodiments, and includes various other modifications. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. In addition, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

In addition, each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, and an SSD, or a recording medium such as an IC card, an SD card, and a DVD.

Also, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

DESCRIPTION OF SYMBOLS 1 Test case selection support system, 100 business system, 101 business DB, 110 Frequency distribution generation apparatus, 111 Data specification reading part, 112 Business DB reading part, 113 Frequency distribution generation part, 114 Frequency distribution output part, 115 Frequency distribution DB, 120 frequency estimation device,
121 Data pattern reading unit, 122 Frequency distribution reading unit, 123 Frequency estimation unit, 124 Data pattern search unit, 125 Data pattern frequency output unit, S300 N-dimensional frequency distribution generation process, 211 Data specification table, 401 DID, 402 Data item name , 403 Frequency distribution application presence / absence, 221 N-dimensional frequency distribution table, 212 business DB table S305 N-dimensional frequency distribution reflection processing, S800 frequency estimation processing, 821 frequency list, 822 exclusion data pattern table, 823 data pattern frequency table

Claims

An information processing system that supports selection of a test case, and generates an N-dimensional frequency distribution that is a distribution of appearance frequencies for combinations of values that can be taken by N elements based on data input to software to be tested An information processing system for estimating an appearance frequency of a predetermined data pattern based on the N-dimensional frequency distribution.
The information processing system according to claim 1, wherein the appearance frequency is estimated based on the N-dimensional frequency distribution for each of the plurality of input data patterns, and the appearance frequency estimated for each of the data patterns is calculated. An information processing system that outputs a written list.
The information processing system according to claim 2, wherein the data patterns are sorted in the order of appearance frequencies estimated for each, and the cumulative values of the appearance frequencies accumulated from the higher appearance frequencies are calculated. An information processing system that outputs each of them together.
The information processing system according to claim 2, wherein information indicating the data pattern whose estimated appearance frequency exceeds a predetermined value or information indicating the data pattern whose estimated appearance frequency is equal to or less than the predetermined value is output. Information processing system.
The information processing system according to claim 1, wherein an appearance frequency estimated for each of combinations of values that can be taken by the N elements included in the data pattern is acquired from the N-dimensional frequency distribution, An information processing system that outputs a minimum value of the obtained appearance frequencies as an upper limit value of the appearance frequency of the data pattern.
The information processing system according to claim 1, wherein a depth-first search is performed on a data pattern that can be input to the test target software, and an appearance frequency of each data pattern acquired in the search process is determined as the N-dimensional. An information processing system that estimates based on a frequency distribution, and selects and outputs a predetermined number of data patterns from the estimated higher appearance frequency.
The information processing system according to claim 6,
A depth-first search is performed for all data patterns that can be input to the test target software, and the appearance frequency of each data pattern obtained in the search process is estimated based on the N-dimensional frequency distribution,
Comparing the appearance frequency estimated for a certain data pattern with the appearance frequency of the data pattern having the highest appearance frequency among the other data patterns that have been searched and estimated for the appearance frequency so far,
If the appearance frequency estimated for the certain data pattern is lower than the appearance frequency of the X-th most frequently occurring data pattern, the depth priority search for the previous branch from the data pattern is terminated,
Information processing system.
2. The information processing system according to claim 1, wherein the data input to the test target software is data registered in a database accessed by the software.
2. The information processing system according to claim 1, wherein the value that the element can take corresponds to any of a plurality of sections obtained by dividing a section that the element can take.
A control method for an information processing system,
In the information processing system,
Generating an N-dimensional frequency distribution that is a distribution of appearance frequencies for combinations of values that can be taken by N elements based on data input to software to be tested;
Estimating an appearance frequency of a predetermined data pattern based on the N-dimensional frequency distribution;
Of controlling an information processing system for executing the process.