WO2015155881A1 - Information processing system, which assists with selection of test case, and control method for said information processing system - Google Patents

Information processing system, which assists with selection of test case, and control method for said information processing system Download PDF

Info

Publication number
WO2015155881A1
WO2015155881A1 PCT/JP2014/060484 JP2014060484W WO2015155881A1 WO 2015155881 A1 WO2015155881 A1 WO 2015155881A1 JP 2014060484 W JP2014060484 W JP 2014060484W WO 2015155881 A1 WO2015155881 A1 WO 2015155881A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
data
information processing
processing system
data pattern
Prior art date
Application number
PCT/JP2014/060484
Other languages
French (fr)
Japanese (ja)
Inventor
陽介 加賀
長野 裕史
Original Assignee
株式会社 日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社 日立製作所 filed Critical 株式会社 日立製作所
Priority to PCT/JP2014/060484 priority Critical patent/WO2015155881A1/en
Publication of WO2015155881A1 publication Critical patent/WO2015155881A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/28Error detection; Error correction; Monitoring by checking the correct order of processing

Definitions

  • the present invention relates to an information processing system that supports selection of a test case and a control method thereof.
  • Patent Document 1 discloses that any number of parameters out of a total of M parameters that can be input to a program in order to generate program test data with a small amount of data while ensuring necessary diversity. It is described that test data covering all patterns for a combination of (M ⁇ N> 1) parameters is generated.
  • test data is generated so as to cover all combinations of arbitrary N parameters. For this reason, for example, if most of the defects included in the software are generated by a combination of N or less parameters, most of the defects can be detected with a small number of generated test cases.
  • the defects included in the software occur with combinations of more than N parameters, the defects cannot be detected sufficiently.
  • the number of defects that can be detected can be increased by increasing N.
  • the number of test cases increases, the load on the test increases, and the time required for the test also increases.
  • Patent Document 1 since the method of Patent Document 1 is intended to cover any combination of N parameters, it is possible to quantitatively determine how much of the data pattern that can actually occur can be verified by a test. It is difficult to grasp. In addition, if you can access the data that is actually used when generating the test case, you can figure out which test case occupies how often, but it is actually used In many cases, confidential information such as personal information is included in the data, and such data cannot often be used effectively.
  • An object of the present invention is to support the selection of test cases for efficiently and surely testing software, and to improve the work efficiency of software development and repair.
  • One aspect of the present invention for achieving the above object is an information processing system that supports selection of a test case, which is a combination of values that can be taken by N elements based on data input to software to be tested.
  • An N-dimensional frequency distribution which is a distribution of the appearance frequency, is generated, and an appearance frequency of a predetermined data pattern is estimated based on the N-dimensional frequency distribution.
  • the present invention it is possible to support selection of a test case for efficiently and surely testing software, and to improve work efficiency related to software development and repair.
  • FIG. 1 is a diagram illustrating a schematic configuration of a test case selection support system 1.
  • FIG. This is a hardware configuration example of the information processing apparatus 200 that can be used as the frequency distribution generation apparatus 110 and the frequency estimation apparatus 120. It is a flowchart explaining N-dimensional frequency distribution production
  • FIG. 1 shows a schematic configuration of an information processing system (hereinafter referred to as a test case selection support system 1) described as the first embodiment.
  • a test case selection support system 1 for example, the business system 100 used in actual business is modified, and the modified software (including programs and data) or the newly added software is tested (acceptance). It is used when implementing.
  • the test case selection support system 1 generates information serving as a determination criterion when a user selects a test case using data registered in a database of the business system 100 (hereinafter also referred to as a business DB 101). . Specifically, the test case selection support system 1 performs statistical processing on data stored in the business DB 101, and distribution of appearance frequencies (hereinafter, referred to as N elements) of combinations of N elements (hereinafter also referred to as N elements). (Also referred to as an N-dimensional frequency distribution), and information serving as a criterion for selecting a test case is generated using the generated N-dimensional frequency distribution.
  • the N-dimensional frequency distribution does not include highly confidential information such as personal information, and the amount of data is greatly reduced as compared with the data stored in the business DB 101. Therefore, the N-dimensional frequency distribution can be taken out from the operation site of the business system 100 and used at the software development site, and the test can be performed safely and efficiently. Moreover, since the test case selection support system 1 generates information as a criterion for selecting a test case using data registered in the business DB 101 used for actual business, data that can be actually obtained You can increase the coverage for the pattern and perform high-quality tests.
  • the test case selection support system 1 includes a frequency distribution generation device 110 and a frequency estimation device 120. These are all realized by using one or more information processing apparatuses (computers).
  • a business DB 101 that is a database for managing data used in actual business is operating.
  • various software (not shown) that performs information processing using the business DB 101 functions. For example, data is exchanged between the business system 100 and the frequency distribution generation device 110 and between the frequency distribution generation device 110 and the frequency estimation device 120 via a communication unit or a recording medium.
  • the business system 100, the frequency distribution generation device 110, and the frequency estimation device 120 may be realized by independent information processing devices, or any two or more may be realized by the same information processing device.
  • the frequency distribution generation device 110 reads data (or a data set) from the business DB 101 and generates an N-dimensional frequency distribution based on the read data.
  • the frequency estimation device 120 reads the N-dimensional frequency distribution generated by the frequency distribution generation device 110 and estimates the appearance frequency for an arbitrary data pattern.
  • FIG. 2 is an example of an information processing device (computer) that implements the frequency distribution generation device 110 and the frequency estimation device 120.
  • the information processing apparatus 200 includes a processor 201, a main storage device 202, an auxiliary storage device 203, an input device 204, a display device 205, and a communication device 206. These are communicably connected via communication means such as a bus (not shown).
  • the processor 201 is configured using, for example, a CPU (Central Processing Unit) and an MPU (Micro Processing Unit). Various functions of the information processing apparatus 200 are realized by the processor 201 reading and executing a program stored in the main storage device 202.
  • a CPU Central Processing Unit
  • MPU Micro Processing Unit
  • the main storage device 202 is a device that stores programs and data, and is, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), an NVRAM (Non Volatile RAM), or the like.
  • the auxiliary storage device 203 is a hard disk drive, an SSD (Solid State Drive), an optical storage device, or the like. Programs and data stored in the auxiliary storage device 203 are loaded into the main storage device 202 as needed.
  • the input device 204 is a user interface that receives input of information and instructions from the user, and is, for example, a keyboard, a mouse, or a touch panel.
  • the output device 205 is a user interface that provides information to the user, and is, for example, a graphic card, a liquid crystal monitor, an LCD (Liquid Crystal Display), or the like.
  • the communication device 206 is a communication interface that communicates with other devices via a communication network, and is, for example, a NIC (Network Interface Card).
  • the frequency distribution generation device 110 includes functions of a data specification reading unit 111, a business DB reading unit 112, a frequency distribution generation unit 113, a frequency distribution output unit 114, and a frequency distribution DB 115. These functions are realized by the processor 201 reading and executing a program stored in the main storage device 202.
  • the data specification reading unit 111 reads the contents of the data specification table 211. Details of the data specification table 211 will be described later.
  • the business DB reading unit 112 reads data from the business DB 101 while referring to the data specifications read by the data specification reading unit 111.
  • the frequency distribution generation unit 113 performs statistical processing on the data read by the business DB reading unit 112 and generates an N-dimensional frequency distribution.
  • the frequency distribution output unit 114 outputs the N-dimensional frequency distribution generated by the frequency distribution generation unit 113 to the frequency distribution DB 115.
  • the frequency distribution DB 115 is a database (hereinafter also referred to as “DB”) managed by a DBMS (DataBase Management System).
  • the frequency distribution DB 115 stores the N-dimensional frequency distribution output by the frequency distribution output unit 114 and provides the frequency estimation apparatus 120 with the contents of the N-dimensional frequency distribution.
  • the frequency estimation device 120 includes functions of a data pattern reading unit 121, a frequency distribution reading unit 122, a frequency estimation unit 123, a data pattern search unit 124, and a data pattern frequency output unit 125. These functions are realized by the processor 201 reading and executing a program stored in the main storage device 202.
  • the data pattern reading unit 121 reads a data pattern that is a target of appearance frequency estimation from the input data pattern table 811.
  • the frequency distribution reading unit 122 reads the N-dimensional frequency distribution generated by the frequency distribution generation device 110 and stored in the frequency distribution DB 115.
  • the frequency estimation unit 123 estimates the appearance frequency of the data pattern read by the data pattern reading unit 121 based on the N-dimensional frequency distribution read by the frequency distribution reading unit 122.
  • the data pattern search unit 124 performs a depth-first search for data patterns that can be input to the software to be tested, estimates the appearance frequency of each data pattern acquired in the search process based on the N-dimensional frequency distribution, and Efficient data pattern extraction.
  • the data pattern frequency output unit 125 extracts a data pattern having a high appearance frequency based on the appearance frequency of the data pattern estimated by the frequency estimation unit 123, and outputs the extracted data pattern.
  • FIG. 3 is a flowchart for explaining processing performed by the frequency distribution generation device 110 (hereinafter also referred to as N-dimensional frequency distribution generation processing S300).
  • the frequency distribution generation device 110 performs N-dimensional frequency distribution generation processing S300 to select N elements (hereinafter, also referred to as N elements) from data composed of M elements in the business DB 101.
  • N elements hereinafter, also referred to as N elements
  • An N-dimensional frequency distribution is generated using the selected N elements.
  • the dimension N of the frequency distribution is in the range of 0 ⁇ N ⁇ M.
  • the N-dimensional frequency distribution is generated for all combinations in which N elements are extracted from M elements.
  • the frequency distribution generation device 110 first reads the contents of the data specification table 211 (S301).
  • FIG. 4 shows an example of the data specification table 211.
  • the data specification table 211 defines the specifications of data registered in the business DB 101.
  • the data specification table 211 is set in advance by a user or the like, for example.
  • Data item ID (hereinafter also referred to as DID 401) is an identifier assigned to each data item of data registered in the business DB 101.
  • the data item name 402 is the name of the data item included in the business DB 101. For example, a character string such as “name”, “date of birth”, “subscription period”, and “average payment amount” is set.
  • the frequency distribution application presence / absence 403 is a flag indicating whether or not to generate a frequency distribution for the data item. “1” is set when the frequency distribution is generated, and “0” is set when the frequency distribution is not generated. In the example of FIG. 4, since the character string “name” is not a target for generating a frequency part, “0” is set in the frequency distribution application 403 and other data items are targets for appearance frequency. In both cases, “1” is set in the frequency distribution application 403.
  • Boundary value 404 is a list of boundary values that divides the range of values that each data item can take.
  • a value used in the equivalence analysis method used in the field of software testing is set as the boundary value.
  • values that change the behavior of software are listed as boundary values, and representative values are extracted from sections divided by the boundary values and used for testing. This makes it possible to verify the behavior of the software with a minimum amount of tests. For example, in FIG. 4, “0”, “12”,..., “480” in which the behavior of software changes as the “subscription period” is set as the boundary value.
  • the boundary value is set by a user or the like based on software specifications or source code, for example. If it is difficult to set the boundary value from the software specifications or source code, set the boundary value mechanically, for example, set 10 values at equal intervals in the direction from the minimum value to the maximum value. Also good.
  • FIG. 5 shows an example of the N-dimensional frequency distribution table 221.
  • the figure shows only the N-dimensional frequency distribution table for a specific combination.
  • a first element 501 indicates a section number of the first element z of N elements (an identifier uniquely given to each section divided by boundary values).
  • the section number is a number that defines a section with the boundary value 404 defined in the data specification table 211 as a boundary and is assigned to the section in order from the smallest. For example, if the boundary value 404 is “0”, “10”,..., The z section number when z ⁇ 0 is “1”, and the z section number when 0 ⁇ z ⁇ 10 is “1”. 2 ”.
  • the Nth element 502 indicates the section number of the Nth element among the N elements.
  • the frequency 503 the number of data belonging to the corresponding section of the first element 501 to the Nth element 502 among the data registered in the business DB 101 is set.
  • the frequency distribution generation apparatus 110 substitutes 1 for a variable i (S 303), and from the table of the business DB 101 (hereinafter referred to as the business DB table 212), the record with the ID i (hereinafter referred to as the business DB). (Also referred to as DB data) is read (S304).
  • FIG. 6 shows an example of the business DB table 212.
  • this business DB table 212 is composed of a plurality of records having items such as ID 601, name 602, date of birth 603, subscription period 604, and average payment amount 605.
  • the ID 601 is an identifier (hereinafter also referred to as a record ID) assigned to each record of the business DB 101.
  • the frequency distribution generation device 110 then reflects the contents of the obtained i-th business DB data in the N-dimensional frequency distribution table 221 which is a temporary (temporary) table (S305).
  • N-dimensional frequency distribution reflection process S305 The details of this process (hereinafter also referred to as N-dimensional frequency distribution reflection process S305) will be described later.
  • the frequency distribution generation device 110 adds 1 to the variable i (S306), and determines whether or not the variable i exceeds the number of data in the business DB 101 (or a preset number of repetitions) (S307). If the variable i exceeds the number of data in the business DB 101 (S307: Yes), the process proceeds to S308. If the variable i does not exceed the number of data in the business DB 101 (S307: No), the process returns to S304.
  • the frequency distribution generation device 110 outputs the contents of the temporary N-dimensional frequency distribution table 221 to the N-dimensional frequency distribution table 221.
  • FIG. 7 is a flowchart for explaining the details of the N-dimensional frequency distribution reflecting process S305 in FIG.
  • the N-dimensional frequency distribution reflecting process S305 will be described below with reference to FIG.
  • N 2
  • the frequency distribution generation device 110 substitutes “1” for the variable m (S701).
  • This variable m corresponds to the DID 401 of the first element described above.
  • the frequency distribution generation device 110 refers to the data in the data specification table 211, reads the value of the frequency distribution application presence / absence 403 of the data whose DID 401 is m, and determines whether or not the read value is “1”. (S702). When the read value is “1”, the process proceeds to S703, and when the read value is “0”, the process proceeds to S709.
  • the frequency distribution generation device 110 substitutes “m + 1” for the variable n.
  • This variable n corresponds to the DID 401 of the second element.
  • the reason for “m + 1” is that the object to be considered is a combination of two variables m and n, and it is sufficient to consider only when m ⁇ n.
  • the frequency distribution generation device 110 refers to the data specification table 211, reads the value of the frequency distribution application presence / absence 403 of the column whose DID 401 is n, and determines whether or not it is “1” (S704). .
  • the process proceeds to S705.
  • the value of the frequency distribution application presence / absence 403 is “0” (S704: No) (S704: No) (S704: No) (S707.
  • the frequency distribution generation device 110 extracts a value with DID 401 of m from the business DB data with ID 601 (record ID) of i of the business DB table 212 and compares the extracted value with the data specification table 211. Is assigned to the variable s.
  • the frequency distribution generation apparatus 110 extracts a value in which the DID 401 is n from the business DB data in which the ID 601 is i on the business DB table 212, and compares the extracted value with the data specification table 211 to obtain a corresponding section number. Assign to variable t.
  • the frequency distribution generation device 110 adds 1 to the variable n (S707), and determines whether the variable n exceeds the total number of data items in the business DB table 212 (S708). When the variable n exceeds the total number of data items in the business DB table 212 (S708: Yes), the processing from S709 is performed. If the variable n does not exceed the total number of data items in the business DB table 212 (S708: No), the processing from S704 is performed.
  • the frequency distribution generation device 110 adds 1 to the variable m (S709), and determines whether the variable m exceeds the total number of data items in the business DB table 212 (S710). If the variable m exceeds the total number of data items (S710: Yes), the process ends. When the variable m does not exceed the total number of data items (S710: No), the processing from S702 is performed.
  • the data stored in the business DB 101 is converted into an N-dimensional frequency distribution.
  • frequency estimation processing S800 processing in which the frequency estimation device 120 estimates the appearance frequency of the data pattern based on the N-dimensional frequency distribution generated as described above will be described.
  • FIG. 8 is a flowchart for explaining the frequency estimation process S800.
  • the frequency estimation apparatus 120 estimates the appearance frequency for each data pattern by performing the frequency estimation process S800 for a preset data pattern using the N-dimensional frequency distribution table 221 generated by the frequency distribution generation apparatus 110. .
  • the frequency estimation device 120 reads an input data pattern table 811 which is a table in which preset data patterns are registered (S801).
  • FIG. 9 shows an example of the input data pattern table 811.
  • a pattern ID 901 is an identifier (hereinafter also referred to as a pattern ID) that is uniquely assigned for each data pattern.
  • Reference numerals 902 to 904 denote the DID 401 of the data specification table 211 corresponding to each data item.
  • a numerical value described at a position where the pattern ID 901 and any of the reference numerals 902 to 904 intersect is a section number.
  • the frequency estimation device 120 estimates the appearance frequency for each of the data patterns stored in the input data pattern table 811.
  • the frequency estimation apparatus 120 then substitutes 1 for the variable i (S802), and reads the i-th data pattern from the input data pattern table 811.
  • the frequency estimation device 120 estimates the appearance frequency for the i-th data pattern and outputs the frequency list 821 (S804).
  • the frequency estimation device 120 estimates the appearance frequency based on the N-dimensional frequency distribution table 221 generated by the frequency distribution generation device 110. Details of this processing (hereinafter also referred to as data pattern frequency estimation processing S804) will be described later.
  • FIG. 10 shows an example of the frequency list 821.
  • the frequency list 821 includes one or more records having three items: a pattern ID 1001, a frequency estimated value 1002, and a frequency upper limit value 1003.
  • a pattern ID 1001 is a pattern ID in the input data pattern table 811.
  • the frequency estimation value 1002 is a value (probability) indicating how much the data pattern is included in the business DB 101.
  • the frequency upper limit value 1003 is an upper limit value (probability) of the appearance frequency of the data pattern in the business DB 101.
  • the frequency estimation apparatus 120 adds 1 to the variable i (S805), and determines whether or not the variable i exceeds the total number of patterns stored in the input data pattern table 811. (S806). If the variable i exceeds the total number of patterns (S806: Yes), the process proceeds to S807. If the variable i does not exceed the total number of patterns (S806: No), the process returns to S803.
  • the frequency estimation apparatus 120 outputs an excluded data pattern table 822 and a data pattern frequency table 823. Details of this process (hereinafter also referred to as data pattern frequency output process S807) will be described later.
  • FIG. 11 shows an example of the excluded data pattern table 822 output by the frequency estimation device 120 in S807.
  • the excluded data pattern table 822 is a table that stores data patterns determined to have an estimated appearance frequency equal to or less than a preset threshold value T%. As shown in the figure, the excluded data pattern table 822 is composed of a plurality of records each having a pattern ID 1101, a frequency estimated value 1102, a frequency upper limit value 1103, and section numbers (reference numerals 1104 to 1106) of each data item. Has been. Information stored in the excluded data pattern table 822 is the same as the information in the input data pattern table 811 and the frequency list 821.
  • FIG. 12 shows an example of the data pattern frequency table 823 output by the frequency estimation device 120 in S807.
  • the data pattern frequency table 823 includes a pattern frequency upper limit 1204 that exceeds a threshold value T%.
  • the data pattern frequency table 823 has a plurality of items each including a pattern ID 1201, a frequency estimated value 1202, a cumulative frequency 1223, a frequency upper limit value 1203, and a section number (reference numerals 1205 to 1207) of each data item. It is composed of records. Information other than the cumulative frequency 1203 is the same as the information in the input data pattern table 811 and the frequency list 821.
  • each data pattern is sorted in descending order of the frequency estimated value 1202, and the accumulated frequency 1203 of a certain data pattern has a cumulative value obtained by summing up the frequency estimated values of data patterns higher than the data pattern. Is stored.
  • the data patterns stored in the data pattern frequency table 823 are employed as test cases in descending order of appearance frequency. As a result, it is possible to improve the efficiency of the test process while increasing the coverage for the data patterns that can actually be taken in the business system 100, and to improve the quality of the business system 100.
  • the user or the like grasps to what extent the entire data pattern included in the business DB 101 can be covered when a test is performed up to a certain data pattern. be able to.
  • the selected data pattern is used as a test case, it is necessary to perform processing such as replacing the section number with a representative value of the section and then inputting it to the system.
  • the boundary value in the data specification table 211 is referred to, and the minimum value, maximum value, median value, and the like of the section indicated by the section number are used as representative values.
  • the frequency estimation device 120 assigns 1 to a variable m indicating the DID 401 of the first element and a variable j indicating the number of updates of the appearance frequency estimation value (S1301).
  • the frequency estimation apparatus 120 substitutes m + 1 for a variable n indicating the DID 401 of the second element (S1302).
  • m + 1 is substituted for the variable n is that the object to be considered is a combination of two variables m and n, and it is sufficient to consider only when m ⁇ n.
  • the frequency estimation apparatus 120 substitutes the section number in which the DID 401 of the data pattern whose pattern ID 1001 is equal to the variable i is m in the variable B (m). Similarly, the section number in which the DID 401 of the data pattern whose pattern ID 1001 is equal to the variable i is n is substituted into the variable B (n) (S1303).
  • the frequency estimation apparatus 120 reads the frequency 503 of the data pattern corresponding to the section number of the first element B (m) and the section number of the second element B (n) from the N-dimensional frequency distribution table 221 ( Hereinafter, this is expressed as a variable Fm, n (B (m), B (n)).) (S1304).
  • the frequency estimation apparatus 120 updates the estimated value of the appearance frequency of the i-th data pattern (S1305).
  • the frequency estimation device 120 uses the following equation based on the estimated value E (j ⁇ 1) in the j ⁇ 1th update and the variables Fm, n (B (m), B (n)) read in S1304.
  • An estimated value E (j) of the appearance frequency is obtained.
  • E (j) ((j ⁇ 1) ⁇ E (j ⁇ 1) + e (j)) / j
  • e (j) is a provisional frequency estimation value obtained for the jth time
  • E (j) is an average value of e (1) to e (j).
  • S is the total number of data included in the business DB table 212, and matches the sum of the frequencies 503 in the N-dimensional frequency distribution table 221.
  • Fm (B (m)) is the number of appearances of the section number B (m) for the data item m, and is expressed by the following equation for an arbitrary DIDn.
  • Fm (B (m)) Fmn (B (m), 1) +...
  • the frequency estimation apparatus 120 updates the upper limit value U (j) of the frequency of the i-th data pattern (S1306).
  • the upper limit value U (j) is based on U (j-1) and the variable Fm, n (B (m), B (n)) read in S1304 as the upper limit value in the j-1st update.
  • U (j) MIN (U (j-1), Fm, n (B (m), B (n)) / S)
  • MIN () is a function that returns the minimum value of a given argument, and therefore the upper limit value U (j) has Fm, n (B (m), B (n)) / for all m and n.
  • the minimum value of S is stored.
  • Fm, n (B (m), B (n)) / S represents the appearance frequency of the data pattern when only the data items m and n are limited. Therefore, the frequency of appearance of the specified data pattern for data items other than m and n is Fm, n (B (m), B (n)) / S or less. For this reason, even if the minimum value of Fm, n (B (m), B (n)) / S for all m, n is U (j), the actual data pattern appearance frequency is U (j) or less. It is guaranteed.
  • the frequency estimation apparatus 120 adds 1 to the variable n, adds 1 to the variable j (S1307), and determines whether the variable n exceeds the total number M of elements (S1308). If the variable n exceeds M (S1308: Yes), the process proceeds to S1309. If the variable n is M or less (S1308: No), the process returns to S1303.
  • the frequency estimation apparatus 120 adds 1 to the variable m (S1309), and determines whether or not the variable m exceeds M ⁇ 1 (S1310). If the variable m exceeds M ⁇ 1 (S1309: Yes), the process proceeds to S1311. If the variable m is equal to or less than M ⁇ 1 (S1309: No), the process returns to S1302.
  • the frequency estimation apparatus 120 outputs the frequency estimation value E (j) and the frequency upper limit U (j) updated repeatedly in S1305 and S1306 to the frequency list 821. Specifically, the frequency estimation apparatus 120 writes the pattern ID 1001 of the data pattern and the corresponding E (j) as the frequency estimation value 1002 and U (j) as the frequency upper limit value 1003, respectively.
  • the data pattern frequency estimation process S804 in FIG. 8 is performed as described above.
  • the frequency estimation apparatus 120 first substitutes 1 for a variable i (S1401), and reads a data pattern whose pattern ID matches i from the frequency list 821 (S1402).
  • the frequency estimation device 120 compares the appearance frequency of the read data pattern (hereinafter referred to as the data pattern) with a preset appearance frequency threshold T (0% ⁇ T ⁇ 100%) (S1403). ). When the appearance frequency of the data pattern exceeds the threshold T (S1403: Yes), the frequency estimation apparatus 120 adds the data pattern to the data pattern frequency table 823 (S1405). At this time, the frequency estimation device 120 acquires the frequency estimated value 1002 and the frequency upper limit value 1003 of the data pattern from the frequency list 821 and sets them in the data pattern frequency table 823. Further, the frequency estimation apparatus 120 acquires the section numbers 902 to 904 of the records with the pattern ID i from the input data pattern table 811 and sets them in the data pattern frequency table 823. Thereafter, the process proceeds to S1406.
  • the process proceeds to S1406.
  • the frequency estimation device 120 adds the data pattern to the excluded data pattern table 822.
  • the frequency estimation apparatus 120 acquires the frequency estimated value 1002 and the frequency upper limit value 1003 of the data pattern from the frequency list 821 and sets them in the excluded data pattern table 822. Further, the frequency estimation apparatus 120 acquires the section numbers 902 to 904 of the records with the pattern ID i from the input data pattern table 811 and sets them in the excluded data pattern table 822. Thereafter, the process proceeds to S1406.
  • the frequency estimation apparatus 120 adds 1 to the variable i (S1406), and determines whether the variable i exceeds the total number of data patterns included in the frequency list 821 (S1407). If the variable i exceeds the total number of data patterns (S1407: Yes), the process proceeds to S1408. If the variable i is equal to or less than the total number of data patterns (S1407: No), the process returns to S1402.
  • the frequency estimation device 120 sorts the data patterns included in the data pattern frequency table 823 in descending order of the frequency estimated value 1002.
  • the frequency estimation apparatus 120 calculates the sum of the frequency estimation values 1202 from the data pattern having the largest frequency estimation value to each data pattern, and substitutes the calculated value into the cumulative frequency 1203 of each data pattern.
  • the data pattern frequency output process S807 of FIG. 8 is performed as described above.
  • Second Embodiment In the second embodiment, instead of inputting a manually set data pattern as in the first embodiment, a depth-first search is performed for all data patterns that can be input to the test target software. The appearance frequency of each acquired data pattern is estimated based on the N-dimensional frequency distribution, and data patterns with the appearance frequency up to the top X are output. However, since the number of possible data patterns is enormous, it is difficult to estimate the appearance frequency for all patterns. Therefore, in the present embodiment, in the process of searching for the data pattern for estimating the frequency, the data pattern with low frequency is pruned, and the top X data patterns are efficiently extracted.
  • FIG. 15 is a flowchart for explaining the data pattern frequency estimation process S1500 shown as the second embodiment.
  • the frequency estimation apparatus 120 first substitutes 1 as an initial value for a variable i (S1501). Subsequently, the frequency estimation device 120 estimates the appearance frequency of the i-th data pattern (S1502). This process is performed by the same method as the data pattern frequency estimation process S804 of FIG. 13 of the first embodiment. The frequency estimation device 120 calculates a frequency estimation value 1002 and a frequency upper limit value 1003 for the i-th data pattern, and stores them in the frequency list 821 in association with the pattern ID.
  • the frequency estimation device 120 searches for a data pattern for estimating the appearance frequency (S1503), and substitutes the pattern ID of the searched data pattern for the variable i (S1504).
  • the frequency estimation device 120 automatically generates a unique pattern ID for each data pattern to be searched.
  • FIG. 16 shows an example of a tree structure 1600 that the frequency estimation device 120 refers to when searching for a data pattern.
  • the frequency estimation device 120 it is assumed that there are three data items of “birth date”, “subscription period”, and “average payment amount”.
  • the frequency estimation device 120 generates a node for each section number of the first data item “birth date”, and connects it to a location directly under the route 1600.
  • a node 1601 and a node 1607 are added.
  • the frequency estimation device 120 generates nodes 1602 and 1605 for each section number of “subscription period” which is the next data item, and connects them under the node 1601.
  • the tree is connected downward (to the previous branch) while increasing the number of elements in the same manner.
  • the frequency estimation device 120 searches the tree structure by a depth-first search. That is, when the search is started from a certain node, the child node of the node is preferentially searched, and after searching all the deeper nodes, the search returns to the parent node and the search is continued.
  • the frequency estimation device 120 searches the tree structure in the order of a node 1601, a node 1602, a node 1603, a node 1604,..., A node 1605, a node 1606,.
  • the search efficiency is improved by not searching for nodes that are assumed not to be in the top X from the estimated frequency of appearance of each node.
  • the frequency estimate value and the frequency upper limit value of the child node are smaller than the frequency estimate value and the frequency upper limit value of the parent node. It is guaranteed. Therefore, using this property, out of the data patterns whose appearance frequency is estimated during the search process, the estimated frequency of the data pattern whose estimated frequency of occurrence is the Xth and the appearance frequency of the parent node to be searched The estimated value is compared, and if the estimated value of the appearance frequency of the parent node is smaller, the child node is not searched (canceled).
  • the frequency estimation device 120 outputs the appearance frequency of the data pattern (S1506). This process is performed by the same method as the data pattern frequency estimation process S807 of FIG. 14 of the first embodiment. In addition, after sorting the data pattern frequency table 823 by the frequency estimation values in S1408 of FIG. 14, the appearance frequencies of all the obtained data patterns are not output, but only the X items in descending order of appearance frequency are output. You may make it output.
  • a list of data patterns with the highest appearance frequency from the data included in the business DB table 212 is output. Accordingly, the user or the like can know a data pattern having a high appearance frequency without preparing the input data pattern table 811 in advance as in the first embodiment, and can efficiently perform a software test.
  • the present invention is not limited to the above-described embodiments, and includes various other modifications.
  • the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described.
  • a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment.
  • each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit.
  • Each of the above-described configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function.
  • Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, and an SSD, or a recording medium such as an IC card, an SD card, and a DVD.
  • control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

Abstract

[Problem] To assist with the selection of a test case for the purpose of efficiently and reliably performing a software test. [Solution] An information processing system (1) that assists with the selection of a test case: generates an N-dimensional frequency distribution which is a distribution of frequencies of appearance with regard to a combination of values that are based on data to be input into software to be tested and can be taken on by N factors; and estimates the frequency of appearance of a prescribed data pattern on the basis of the generated N-dimensional frequency distribution. The information processing system (1) outputs a frequency list (821) in which the estimated frequency of appearance with regard to each of a plurality of data patterns is disclosed and in which frequency of appearance cumulative values that have cumulated from the highest frequency of appearance are entered alongside. Furthermore, the information processing system (1) outputs: an excluded data pattern table (822) which is information representing data patterns in which the estimated frequency of appearance exceeds a prescribed value; and a data pattern frequency table (823) which is information representing data patterns in which the estimated frequency of appearance is equal to or less than the prescribed value.

Description

テストケースの選択を支援する情報処理システム、及びその制御方法Information processing system for supporting test case selection and control method thereof
 本発明は、テストケースの選択を支援する情報処理システム、及びその制御方法に関する。 The present invention relates to an information processing system that supports selection of a test case and a control method thereof.
 データベースに格納されている多数のデータを入力として結果を出力する情報処理システムにおいては、入力される可能性のあるデータの組み合わせのバリエーションの数が膨大になる。そのため、こうした情報処理システムについてソフトウェアのテストを実施するに際しては、テストケースを適切に絞り込んで効率よくテストを遂行する必要がある。 In an information processing system that outputs a large number of data stored in a database as an input, the number of combinations of data that may be input becomes enormous. Therefore, when performing software testing on such an information processing system, it is necessary to appropriately narrow down the test cases and perform the test efficiently.
 これに関し、例えば、特許文献1には、プログラムテスト用のデータを、必要な多様性を確保しつつ少ないデータ量で生成すべく、プログラムに入力され得る合計M個のパラメータのうち、任意のN(M≧N>1)個のパラメータの組合せについて全てのパターンをそれぞれ網羅するテストデータを生成することが記載されている。 In this regard, for example, Patent Document 1 discloses that any number of parameters out of a total of M parameters that can be input to a program in order to generate program test data with a small amount of data while ensuring necessary diversity. It is described that test data covering all patterns for a combination of (M ≧ N> 1) parameters is generated.
特開2006-227958号公報JP 2006-227958 A
 特許文献1では、任意のN個のパラメータの組み合わせを全て網羅するようにテストデータを生成している。このため、例えば、ソフトウェアに含まれている不具合の大部分がN個以下のパラメータの組み合わせで発生するような場合であれば生成した少数のテストケースで大部分の不具合を検出することができる。 In Patent Document 1, test data is generated so as to cover all combinations of arbitrary N parameters. For this reason, for example, if most of the defects included in the software are generated by a combination of N or less parameters, most of the defects can be detected with a small number of generated test cases.
 しかしながら、ソフトウェアに含まれている不具合の多くがN個を超えるパラメータの組み合わせで発生するような場合には不具合を十分に検出することができない。またNを大きくすることで検出できる不具合を増やすことはできるが、その分、テストケースが増えてテストにかかる負担が増大しテストに要する時間も長くなる。 However, when many of the defects included in the software occur with combinations of more than N parameters, the defects cannot be detected sufficiently. In addition, the number of defects that can be detected can be increased by increasing N. However, the number of test cases increases, the load on the test increases, and the time required for the test also increases.
 また特許文献1の方法は、任意のN個のパラメータの組み合わせの網羅を対象としているため、実際に起こり得るデータパターンのうちどの程度の割合をテストによって検証することができたのかを定量的に把握することは難しい。またテストケースを生成する際に実際に使用されているデータにアクセスすることができればどのテストケースがどの程度の頻度を占めているのかを把握することが可能であるが、実際に使用されているデータに個人情報等の秘密情報が含まれている場合も多く、こうしたデータを有効に活用することができないことも多い。 In addition, since the method of Patent Document 1 is intended to cover any combination of N parameters, it is possible to quantitatively determine how much of the data pattern that can actually occur can be verified by a test. It is difficult to grasp. In addition, if you can access the data that is actually used when generating the test case, you can figure out which test case occupies how often, but it is actually used In many cases, confidential information such as personal information is included in the data, and such data cannot often be used effectively.
 本発明の目的は、ソフトウェアのテストを効率よく確実に行うためのテストケースの選択を支援し、ソフトウェアの開発や改修にかかる作業効率を向上することにある。 An object of the present invention is to support the selection of test cases for efficiently and surely testing software, and to improve the work efficiency of software development and repair.
 上記目的を達成するための本発明の一つは、テストケースの選択を支援する情報処理システムであって、テスト対象のソフトウェアに入力されるデータに基づきN個の要素が取り得る値の組み合わせについての出現頻度の分布であるN次元頻度分布を生成し、前記N次元頻度分布に基づき所定のデータパターンの出現頻度を推定する。 One aspect of the present invention for achieving the above object is an information processing system that supports selection of a test case, which is a combination of values that can be taken by N elements based on data input to software to be tested. An N-dimensional frequency distribution, which is a distribution of the appearance frequency, is generated, and an appearance frequency of a predetermined data pattern is estimated based on the N-dimensional frequency distribution.
 その他、本願が開示する課題、及びその解決方法は、発明を実施するための形態の欄、及び図面により明らかにされる。 The other problems disclosed in the present application and the solutions thereof will be clarified by the description of the mode for carrying out the invention and the drawings.
 本発明によれば、ソフトウェアのテストを効率よく確実に行うためのテストケースの選択を支援し、ソフトウェアの開発や改修にかかる作業効率を向上することができる。 According to the present invention, it is possible to support selection of a test case for efficiently and surely testing software, and to improve work efficiency related to software development and repair.
テストケース選択支援システム1の概略的な構成を示す図である。1 is a diagram illustrating a schematic configuration of a test case selection support system 1. FIG. 頻度分布生成装置110や頻度推定装置120として用いることが可能な情報処理装置200のハードウェア構成例である。This is a hardware configuration example of the information processing apparatus 200 that can be used as the frequency distribution generation apparatus 110 and the frequency estimation apparatus 120. N次元頻度分布生成処理S300を説明するフローチャートである。It is a flowchart explaining N-dimensional frequency distribution production | generation process S300. データ仕様テーブル211の一例である。It is an example of the data specification table 211. N次元頻度分布テーブル221の一例である。4 is an example of an N-dimensional frequency distribution table 221. 業務DBテーブル212の一例である。4 is an example of a business DB table 212. N次元頻度分布反映処理S305を説明するフローチャートである。It is a flowchart explaining N-dimensional frequency distribution reflection process S305. 頻度推定処理S800を説明するフローチャートである。It is a flowchart explaining frequency estimation processing S800. 入力データパターンテーブル811の一例である。It is an example of an input data pattern table 811. 頻度リスト821の一例である。It is an example of a frequency list 821. 除外データパターンテーブル822の一例である。7 is an example of an exclusion data pattern table 822; データパターン頻度テーブル823の一例である。4 is an example of a data pattern frequency table 823. データパターン頻度推定処理S804を説明するフローチャートである。It is a flowchart explaining data pattern frequency estimation process S804. データパターン頻度出力処理S807を説明するフローチャートである。It is a flowchart explaining data pattern frequency output processing S807. データパターン頻度推定処理S1500を説明するフローチャートである。It is a flowchart explaining data pattern frequency estimation process S1500. 頻度推定装置120がデータパターンの探索に際して参照するツリー構造1600の一例である。It is an example of a tree structure 1600 that the frequency estimation device 120 refers to when searching for a data pattern.
 以下、実施形態について図面を参照しつつ詳細に説明する。 Hereinafter, embodiments will be described in detail with reference to the drawings.
=第1実施形態=
 図1に第1実施形態として説明する情報処理システム(以下、テストケース選択支援システム1と称する。)の概略的な構成を示している。テストケース選択支援システム1は、例えば、実際の業務に用いられている業務システム100について改修が行われ、修正されたソフトウェア(プログラムやデータを含む)や新規に追加されたソフトウェアのテスト(検収)を実施する際に用いられる。
= First embodiment =
FIG. 1 shows a schematic configuration of an information processing system (hereinafter referred to as a test case selection support system 1) described as the first embodiment. In the test case selection support system 1, for example, the business system 100 used in actual business is modified, and the modified software (including programs and data) or the newly added software is tested (acceptance). It is used when implementing.
 テストケース選択支援システム1は、ユーザがテストケースを選択する際の判断基準となる情報を、業務システム100のデータベース(以下、業務DB101とも称する。)に登録されているデータを利用して生成する。具体的には、テストケース選択支援システム1は、業務DB101に格納されているデータについて統計処理を行い、N個の要素(以下、N要素とも称する。)の組み合わせについて出現頻度の分布(以下、N次元頻度分布とも称する。)を生成し、生成したN次元頻度分布を用いてテストケースを選択するための判断基準となる情報を生成する。 The test case selection support system 1 generates information serving as a determination criterion when a user selects a test case using data registered in a database of the business system 100 (hereinafter also referred to as a business DB 101). . Specifically, the test case selection support system 1 performs statistical processing on data stored in the business DB 101, and distribution of appearance frequencies (hereinafter, referred to as N elements) of combinations of N elements (hereinafter also referred to as N elements). (Also referred to as an N-dimensional frequency distribution), and information serving as a criterion for selecting a test case is generated using the generated N-dimensional frequency distribution.
 ここで上記N次元頻度分布は個人情報等の秘密性の高い情報を含まず、またデータ量も業務DB101に格納されているデータに比べて大幅に削減される。そのため、上記N次元頻度分布は業務システム100の運用現場から持ち出してソフトウェアの開発現場等で利用することが可能であり、安全かつ効率よくテストを実施することができる。またテストケース選択支援システム1は、実際の業務に用いられている業務DB101に登録されているデータを用いてテストケースを選択するための判断基準となる情報を生成するので、実際に取り得るデータパターンに対するカバレッジを上げて質の高いテストを実施することができる。 Here, the N-dimensional frequency distribution does not include highly confidential information such as personal information, and the amount of data is greatly reduced as compared with the data stored in the business DB 101. Therefore, the N-dimensional frequency distribution can be taken out from the operation site of the business system 100 and used at the software development site, and the test can be performed safely and efficiently. Moreover, since the test case selection support system 1 generates information as a criterion for selecting a test case using data registered in the business DB 101 used for actual business, data that can be actually obtained You can increase the coverage for the pattern and perform high-quality tests.
 図1に示すように、テストケース選択支援システム1は、頻度分布生成装置110及び頻度推定装置120を含む。これらはいずれも一つ以上の情報処理装置(コンピュータ)を用いて実現されている。業務システム100では、実際の業務に用いられているデータが管理されるデータベースである業務DB101が稼働している。また業務システム100では、業務DB101を利用して情報処理を行う様々なソフトウェア(不図示)が機能している。業務システム100と頻度分布生成装置110との間、並びに頻度分布生成装置110と頻度推定装置120との間では、例えば、通信手段や記録媒体を介してデータの授受が行われる。業務システム100、頻度分布生成装置110、及び頻度推定装置120は、夫々独立した情報処理装置によって実現されていてもよいし、いずれか2つ以上が同じ情報処理装置によって実現されていてもよい。 1, the test case selection support system 1 includes a frequency distribution generation device 110 and a frequency estimation device 120. These are all realized by using one or more information processing apparatuses (computers). In the business system 100, a business DB 101 that is a database for managing data used in actual business is operating. In the business system 100, various software (not shown) that performs information processing using the business DB 101 functions. For example, data is exchanged between the business system 100 and the frequency distribution generation device 110 and between the frequency distribution generation device 110 and the frequency estimation device 120 via a communication unit or a recording medium. The business system 100, the frequency distribution generation device 110, and the frequency estimation device 120 may be realized by independent information processing devices, or any two or more may be realized by the same information processing device.
 頻度分布生成装置110は、業務DB101からデータ(もしくはデータセット)を読み込み、読み込んだデータに基づきN次元頻度分布を生成する。また頻度推定装置120は、頻度分布生成装置110で生成したN次元頻度分布を読み込み、任意のデータパターンに対する出現頻度を推定する。 The frequency distribution generation device 110 reads data (or a data set) from the business DB 101 and generates an N-dimensional frequency distribution based on the read data. In addition, the frequency estimation device 120 reads the N-dimensional frequency distribution generated by the frequency distribution generation device 110 and estimates the appearance frequency for an arbitrary data pattern.
 図2は頻度分布生成装置110や頻度推定装置120を実現する情報処理装置(コンピュータ)の一例である。同図に示すように、この情報処理装置200は、プロセッサ201、主記憶装置202、補助記憶装置203、入力装置204、表示装置205、及び通信装置206を備える。これらは図示しないバス等の通信手段を介して通信可能に接続されている。 FIG. 2 is an example of an information processing device (computer) that implements the frequency distribution generation device 110 and the frequency estimation device 120. As shown in the figure, the information processing apparatus 200 includes a processor 201, a main storage device 202, an auxiliary storage device 203, an input device 204, a display device 205, and a communication device 206. These are communicably connected via communication means such as a bus (not shown).
 プロセッサ201は、例えば、CPU(Central Processing Unit)、MPU(Micro Processing Unit)を用いて構成される。プロセッサ201が、主記憶装置202に格納されているプログラムを読み出して実行することにより、情報処理装置200の様々な機能が実現される。 The processor 201 is configured using, for example, a CPU (Central Processing Unit) and an MPU (Micro Processing Unit). Various functions of the information processing apparatus 200 are realized by the processor 201 reading and executing a program stored in the main storage device 202.
 主記憶装置202は、プログラムやデータを記憶する装置であり、例えば、ROM(Read Only Memory)、RAM(Random Access Memory)、NVRAM(Non Volatile RAM)等である。補助記憶装置203は、ハードディスクドライブ、SSD(Solid State Drive)、光学式記憶装置等である。補助記憶装置203に格納されているプログラムやデータは主記憶装置202に随時ロードされる。 The main storage device 202 is a device that stores programs and data, and is, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), an NVRAM (Non Volatile RAM), or the like. The auxiliary storage device 203 is a hard disk drive, an SSD (Solid State Drive), an optical storage device, or the like. Programs and data stored in the auxiliary storage device 203 are loaded into the main storage device 202 as needed.
 入力装置204は、ユーザから情報や指示の入力を受け付けるユーザインタフェースであり、例えば、キーボード、マウス、タッチパネルである。出力装置205は、ユーザに情報を提供するユーザインタフェースであり、例えば、グラフィックカード、液晶モニタ、LCD(Liquid Crystal Display)等である。通信装置206は、通信ネットワークを介して他の装置と通信する通信インタフェースであり、例えば、NIC(Network Interface Card)である。 The input device 204 is a user interface that receives input of information and instructions from the user, and is, for example, a keyboard, a mouse, or a touch panel. The output device 205 is a user interface that provides information to the user, and is, for example, a graphic card, a liquid crystal monitor, an LCD (Liquid Crystal Display), or the like. The communication device 206 is a communication interface that communicates with other devices via a communication network, and is, for example, a NIC (Network Interface Card).
 図1に示すように、頻度分布生成装置110は、データ仕様読込部111、業務DB読込部112、頻度分布生成部113、頻度分布出力部114、及び頻度分布DB115の各機能を備える。これらの機能は、プロセッサ201が、主記憶装置202に格納されているプログラムを読み出して実行することにより実現される。 As shown in FIG. 1, the frequency distribution generation device 110 includes functions of a data specification reading unit 111, a business DB reading unit 112, a frequency distribution generation unit 113, a frequency distribution output unit 114, and a frequency distribution DB 115. These functions are realized by the processor 201 reading and executing a program stored in the main storage device 202.
 データ仕様読込部111は、データ仕様テーブル211の内容を読み込む。データ仕様テーブル211の詳細については後述する。業務DB読込部112は、データ仕様読込部111が読み込んだデータの仕様を参照しつつ業務DB101からデータを読み込む。頻度分布生成部113は、業務DB読込部112が読み込んだデータについて統計処理を行いN次元頻度分布を生成する。頻度分布出力部114は、頻度分布生成部113で生成したN次元頻度分布を頻度分布DB115に出力する。頻度分布DB115は、DBMS(DataBase Management System)によって管理されるデータベース(以下、「DB」とも称する。)である。頻度分布DB115は、頻度分布出力部114が出力するN次元頻度分布を格納し、頻度推定装置120にN次元頻度分布の内容を提供する。 The data specification reading unit 111 reads the contents of the data specification table 211. Details of the data specification table 211 will be described later. The business DB reading unit 112 reads data from the business DB 101 while referring to the data specifications read by the data specification reading unit 111. The frequency distribution generation unit 113 performs statistical processing on the data read by the business DB reading unit 112 and generates an N-dimensional frequency distribution. The frequency distribution output unit 114 outputs the N-dimensional frequency distribution generated by the frequency distribution generation unit 113 to the frequency distribution DB 115. The frequency distribution DB 115 is a database (hereinafter also referred to as “DB”) managed by a DBMS (DataBase Management System). The frequency distribution DB 115 stores the N-dimensional frequency distribution output by the frequency distribution output unit 114 and provides the frequency estimation apparatus 120 with the contents of the N-dimensional frequency distribution.
 図1に示すように、頻度推定装置120は、データパターン読込部121、頻度分布読込部122、頻度推定部123、データパターン探索部124、及びデータパターン頻度出力部125の各機能を備える。これらの機能は、プロセッサ201が、主記憶装置202に格納されているプログラムを読み出して実行することにより実現される。 As shown in FIG. 1, the frequency estimation device 120 includes functions of a data pattern reading unit 121, a frequency distribution reading unit 122, a frequency estimation unit 123, a data pattern search unit 124, and a data pattern frequency output unit 125. These functions are realized by the processor 201 reading and executing a program stored in the main storage device 202.
 データパターン読込部121は、入力データパターンテーブル811から出現頻度の推定の対象とするデータパターンを読み込む。頻度分布読込部122は、頻度分布生成装置110が生成して頻度分布DB115に格納されているN次元頻度分布を読み込む。頻度推定部123は、データパターン読込部121が読み込んだデータパターンの出現頻度を、頻度分布読込部122で読み込んだN次元頻度分布に基づき推定する。データパターン探索部124は、テスト対象のソフトウェアに入力され得るデータパターンについて深さ優先探索を行い、探索の過程で取得される各データパターンの出現頻度をN次元頻度分布に基づき推定し、出現頻度の高いデータパターンを効率よく抽出する。データパターン頻度出力部125は、頻度推定部123で推定したデータパターンの出現頻度に基づき、出現頻度の高いデータパターンを抽出し、抽出したデータパターンを出力する。 The data pattern reading unit 121 reads a data pattern that is a target of appearance frequency estimation from the input data pattern table 811. The frequency distribution reading unit 122 reads the N-dimensional frequency distribution generated by the frequency distribution generation device 110 and stored in the frequency distribution DB 115. The frequency estimation unit 123 estimates the appearance frequency of the data pattern read by the data pattern reading unit 121 based on the N-dimensional frequency distribution read by the frequency distribution reading unit 122. The data pattern search unit 124 performs a depth-first search for data patterns that can be input to the software to be tested, estimates the appearance frequency of each data pattern acquired in the search process based on the N-dimensional frequency distribution, and Efficient data pattern extraction. The data pattern frequency output unit 125 extracts a data pattern having a high appearance frequency based on the appearance frequency of the data pattern estimated by the frequency estimation unit 123, and outputs the extracted data pattern.
 続いて、以上の構成からなるテストケース選択支援システム1において行われる処理について詳細に説明する。 Next, processing performed in the test case selection support system 1 having the above configuration will be described in detail.
<N次元頻度分布生成処理>
 図3は頻度分布生成装置110が行う処理(以下、N次元頻度分布生成処理S300とも称する。)を説明するフローチャートである。頻度分布生成装置110は、このN次元頻度分布生成処理S300を行うことにより、業務DB101においてM個の要素から構成されるデータからN個の要素(以下、N要素とも称する。)を選択し、選択したN要素を用いてN次元頻度分布を生成する。尚、以下において、頻度分布の次元Nは0<N≦Mの範囲であるものとする。またN次元頻度分布は、M個の要素からN個の要素を取り出す全ての組み合わせについて生成するものとする。
<N-dimensional frequency distribution generation processing>
FIG. 3 is a flowchart for explaining processing performed by the frequency distribution generation device 110 (hereinafter also referred to as N-dimensional frequency distribution generation processing S300). The frequency distribution generation device 110 performs N-dimensional frequency distribution generation processing S300 to select N elements (hereinafter, also referred to as N elements) from data composed of M elements in the business DB 101. An N-dimensional frequency distribution is generated using the selected N elements. In the following, it is assumed that the dimension N of the frequency distribution is in the range of 0 <N ≦ M. In addition, the N-dimensional frequency distribution is generated for all combinations in which N elements are extracted from M elements.
 同図に示すように、まず頻度分布生成装置110は、データ仕様テーブル211の内容を読み込む(S301)。 As shown in the figure, the frequency distribution generation device 110 first reads the contents of the data specification table 211 (S301).
 図4にデータ仕様テーブル211の一例を示している。同図に示すように、データ仕様テーブル211には、業務DB101に登録されているデータの仕様が定義されている。データ仕様テーブル211は、例えば、予めユーザ等が設定しておく。 FIG. 4 shows an example of the data specification table 211. As shown in the figure, the data specification table 211 defines the specifications of data registered in the business DB 101. The data specification table 211 is set in advance by a user or the like, for example.
 データ項目ID(以下、DID401とも称する。)は、業務DB101に登録されているデータのデータ項目ごとに付与される識別子である。データ項目名402は、業務DB101に含まれているデータ項目の名称であり、例えば、「氏名」、「生年月日」、「加入期間」、及び「平均支払額」等の文字列が設定される。 Data item ID (hereinafter also referred to as DID 401) is an identifier assigned to each data item of data registered in the business DB 101. The data item name 402 is the name of the data item included in the business DB 101. For example, a character string such as “name”, “date of birth”, “subscription period”, and “average payment amount” is set. The
 頻度分布適用有無403は、そのデータ項目について頻度分布を生成するか否かを示すフラグであり、頻度分布を生成する場合は「1」が、生成しない場合は「0」が設定される。図4の例では「氏名」の文字列は頻度部分を生成する対象とはしないので頻度分布適用有無403に「0」が設定され、それ以外のデータ項目については出現頻度を求める対象とするのでいずれも頻度分布適用有無403に「1」が設定されている。 The frequency distribution application presence / absence 403 is a flag indicating whether or not to generate a frequency distribution for the data item. “1” is set when the frequency distribution is generated, and “0” is set when the frequency distribution is not generated. In the example of FIG. 4, since the character string “name” is not a target for generating a frequency part, “0” is set in the frequency distribution application 403 and other data items are targets for appearance frequency. In both cases, “1” is set in the frequency distribution application 403.
 境界値404は、各データ項目が取り得る値の範囲を分割する境界値のリストである。本例では、境界値として、ソフトウェアのテストの分野で用いられる同値分析法で用いられる値を設定している。同値分析法では、ソフトウェアの挙動が変化する値を境界値として列挙し、境界値で区分された区間から代表値を抽出してテストに用いる。これにより最小限のテスト量でソフトウェアの挙動を一通り検証することが可能になる。例えば、図4では「加入期間」としてソフトウェアの挙動が変化する「0」、「12」、・・・「480」を境界値として設定している。境界値は、例えば、ユーザ等がソフトウェアの仕様やソースコードに基づいて設定する。ソフトウェアの仕様やソースコードから境界値を設定することが困難な場合は、例えば、最小値から最大値の方向に等間隔に10個の値を設定する等、機械的に境界値を設定してもよい。 Boundary value 404 is a list of boundary values that divides the range of values that each data item can take. In this example, a value used in the equivalence analysis method used in the field of software testing is set as the boundary value. In the equivalence analysis method, values that change the behavior of software are listed as boundary values, and representative values are extracted from sections divided by the boundary values and used for testing. This makes it possible to verify the behavior of the software with a minimum amount of tests. For example, in FIG. 4, “0”, “12”,..., “480” in which the behavior of software changes as the “subscription period” is set as the boundary value. The boundary value is set by a user or the like based on software specifications or source code, for example. If it is difficult to set the boundary value from the software specifications or source code, set the boundary value mechanically, for example, set 10 values at equal intervals in the direction from the minimum value to the maximum value. Also good.
 図3に戻り、S302において、頻度分布生成装置110は、N次元頻度分布テーブル221を初期生成する。具体的には、頻度分布生成装置110は、頻度分布適用有無403が「1」であるデータ項目についてN要素の組み合わせを求め、求めた組み合わせに対応するN次元頻度分布テーブル221を生成する。例えば、頻度分布適用有無403が「1」であるデータ項目が10個あり、N=2である場合、頻度分布生成装置110は45(=10C2)通りの組み合わせを求め、夫々に対応するN次元頻度分布テーブル221を生成する。 3, in S302, the frequency distribution generation apparatus 110 initially generates the N-dimensional frequency distribution table 221. Specifically, the frequency distribution generation apparatus 110 obtains a combination of N elements for a data item whose frequency distribution application presence / absence 403 is “1”, and generates an N-dimensional frequency distribution table 221 corresponding to the obtained combination. For example, when there are 10 data items whose frequency distribution application presence / absence 403 is “1” and N = 2, the frequency distribution generation device 110 obtains 45 (= 10C2) combinations, and N dimensions corresponding to the combinations. A frequency distribution table 221 is generated.
 図5にN次元頻度分布テーブル221の一例を示す。同図には特定の組み合わせに対するN次元頻度分布テーブルのみを示している。同図において、第1要素501は、N要素のうち1番目の要素zの区間番号(境界値により区切られた区間ごとに一意に付与される識別子)を示している。本例では、区間番号は、データ仕様テーブル211に定義されている境界値404を境界とする区間を定義し、その区間に対して小さい方から順に付与した番号である。例えば、境界値404が「0」、「10」・・・であれば、z<0のときのzの区間番号は「1」となり、0<z≦10のときのzの区間番号は「2」となる。同様に第N要素502はN要素のうちN番目の要素の区間番号を示している。頻度503には、業務DB101に登録されているデータのうち、第1要素501~第N要素502の対応する区間に所属するデータの個数が設定される。 FIG. 5 shows an example of the N-dimensional frequency distribution table 221. The figure shows only the N-dimensional frequency distribution table for a specific combination. In the figure, a first element 501 indicates a section number of the first element z of N elements (an identifier uniquely given to each section divided by boundary values). In this example, the section number is a number that defines a section with the boundary value 404 defined in the data specification table 211 as a boundary and is assigned to the section in order from the smallest. For example, if the boundary value 404 is “0”, “10”,..., The z section number when z <0 is “1”, and the z section number when 0 <z ≦ 10 is “1”. 2 ”. Similarly, the Nth element 502 indicates the section number of the Nth element among the N elements. In the frequency 503, the number of data belonging to the corresponding section of the first element 501 to the Nth element 502 among the data registered in the business DB 101 is set.
 図3に戻り、続いて頻度分布生成装置110は変数iに1を代入し(S303)、業務DB101のテーブル(以下、業務DBテーブル212と称する。)から、IDがiのレコード(以下、業務DBデータとも称する。)を読み込む(S304)。 Returning to FIG. 3, the frequency distribution generation apparatus 110 then substitutes 1 for a variable i (S 303), and from the table of the business DB 101 (hereinafter referred to as the business DB table 212), the record with the ID i (hereinafter referred to as the business DB). (Also referred to as DB data) is read (S304).
 図6に業務DBテーブル212の一例を示している。同図に示すように、この業務DBテーブル212は、ID601、氏名602、生年月日603、加入期間604、及び平均支払額605等の各項目を有する複数のレコードで構成されている。尚、ID601は業務DB101の各レコードに付与される識別子(以下、レコードIDとも称する。)である。 FIG. 6 shows an example of the business DB table 212. As shown in the figure, this business DB table 212 is composed of a plurality of records having items such as ID 601, name 602, date of birth 603, subscription period 604, and average payment amount 605. The ID 601 is an identifier (hereinafter also referred to as a record ID) assigned to each record of the business DB 101.
 図3に戻り、続いて頻度分布生成装置110は、得られたi番目の業務DBデータの内容を、テンポラリの(一時的な)テーブルであるN次元頻度分布テーブル221に反映する(S305)。尚、この処理(以下、N次元頻度分布反映処理S305とも称する。)の詳細については後述する。 3, the frequency distribution generation device 110 then reflects the contents of the obtained i-th business DB data in the N-dimensional frequency distribution table 221 which is a temporary (temporary) table (S305). The details of this process (hereinafter also referred to as N-dimensional frequency distribution reflection process S305) will be described later.
 続いて頻度分布生成装置110は、変数iに1を加算し(S306)、変数iが業務DB101のデータ数(もしくは予め設定した繰り返し数)を超えているか否かを判定する(S307)。変数iが業務DB101のデータ数を超えている場合(S307:Yes)、処理はS308に進む。変数iが業務DB101のデータ数を超えていない場合(S307:No)、処理はS304に戻る。 Subsequently, the frequency distribution generation device 110 adds 1 to the variable i (S306), and determines whether or not the variable i exceeds the number of data in the business DB 101 (or a preset number of repetitions) (S307). If the variable i exceeds the number of data in the business DB 101 (S307: Yes), the process proceeds to S308. If the variable i does not exceed the number of data in the business DB 101 (S307: No), the process returns to S304.
 S308では、頻度分布生成装置110は、テンポラリのN次元頻度分布テーブル221の内容をN次元頻度分布テーブル221に出力する。 In S308, the frequency distribution generation device 110 outputs the contents of the temporary N-dimensional frequency distribution table 221 to the N-dimensional frequency distribution table 221.
 図7は図3のN次元頻度分布反映処理S305の詳細を説明するフローチャートである。以下、同図とともにN次元頻度分布反映処理S305について説明する。尚、説明の簡単のため、以下ではN=2とした場合を例として説明する。N>2の場合は既存の変数(m、n)に加えて新たな変数を導入しN重のループを持つ処理とする必要がある。 FIG. 7 is a flowchart for explaining the details of the N-dimensional frequency distribution reflecting process S305 in FIG. The N-dimensional frequency distribution reflecting process S305 will be described below with reference to FIG. For the sake of simplicity of explanation, a case where N = 2 is described below as an example. In the case of N> 2, it is necessary to introduce a new variable in addition to the existing variable (m, n) and to have a process having N-fold loops.
 同図に示すように、まず頻度分布生成装置110は、変数mに「1」を代入する(S701)。この変数mは、前述した第1要素のDID401に対応している。 As shown in the figure, first, the frequency distribution generation device 110 substitutes “1” for the variable m (S701). This variable m corresponds to the DID 401 of the first element described above.
 続いて頻度分布生成装置110は、データ仕様テーブル211のデータを参照し、DID401がmであるデータの頻度分布適用有無403の値を読み込み、読み込んだ値が「1」であるか否かを判定する(S702)。読み込んだ値が「1」である場合はS703の処理に進み、読み込んだ値が「0」である場合はS709の処理に進む。 Subsequently, the frequency distribution generation device 110 refers to the data in the data specification table 211, reads the value of the frequency distribution application presence / absence 403 of the data whose DID 401 is m, and determines whether or not the read value is “1”. (S702). When the read value is “1”, the process proceeds to S703, and when the read value is “0”, the process proceeds to S709.
 S703では、頻度分布生成装置110は、変数nに「m+1」を代入する。この変数nは第2要素のDID401に対応している。「m+1」としたのは、考慮すべき対象が2変数m,nの組み合わせであり、m<nの場合のみを考えれば十分だからである。 In S703, the frequency distribution generation device 110 substitutes “m + 1” for the variable n. This variable n corresponds to the DID 401 of the second element. The reason for “m + 1” is that the object to be considered is a combination of two variables m and n, and it is sufficient to consider only when m <n.
 続いて頻度分布生成装置110は、データ仕様テーブル211を参照し、DID401がnとなるカラムの頻度分布適用有無403の値を読み込み、それが「1」であるか否かを判定する(S704)。頻度分布適用有無403の値が「1」である場合(S704:Yes)、S705からの処理に進む。頻度分布適用有無403の値が「0」である場合(S704:No)、S707からの処理に進む。 Subsequently, the frequency distribution generation device 110 refers to the data specification table 211, reads the value of the frequency distribution application presence / absence 403 of the column whose DID 401 is n, and determines whether or not it is “1” (S704). . When the value of the frequency distribution application presence / absence 403 is “1” (S704: Yes), the process proceeds to S705. When the value of the frequency distribution application presence / absence 403 is “0” (S704: No), the process proceeds to S707.
 続いて頻度分布生成装置110は、業務DBテーブル212のID601(レコードID)がiとなる業務DBデータから、DID401がmとなる値を取りだし、取り出した値をデータ仕様テーブル211と対照して対応する区間番号を変数sに代入する。また頻度分布生成装置110は、業務DBテーブル212上のID601がiとなる業務DBデータから、DID401がnとなる値を取りだし、取り出した値をデータ仕様テーブル211と対照して対応する区間番号を変数tに代入する。 Subsequently, the frequency distribution generation device 110 extracts a value with DID 401 of m from the business DB data with ID 601 (record ID) of i of the business DB table 212 and compares the extracted value with the data specification table 211. Is assigned to the variable s. In addition, the frequency distribution generation apparatus 110 extracts a value in which the DID 401 is n from the business DB data in which the ID 601 is i on the business DB table 212, and compares the extracted value with the data specification table 211 to obtain a corresponding section number. Assign to variable t.
 続いて頻度分布生成装置110は、テンポラリのN次元頻度分布テーブル221の第1要素のDID=m、第2要素のDID=nに対応するカラムを更新する(S706)。具体的には、頻度分布生成装置110は、N次元頻度分布テーブル221に、第1要素の区間番号501=s、第2要素の区間番号502=tとなるカラムが存在するか否かを判定し、カラム存在する場合は対応するカラムの頻度に1を加算する。またカラムが存在しない場合はテンポラリのN次元頻度分布テーブル221に新たなカラムを追加し、追加したカラムに第1要素の区間番号501としてmを、第2要素の区間番号502としてnを、頻度503として1を設定する。 Subsequently, the frequency distribution generation device 110 updates the columns corresponding to the first element DID = m and the second element DID = n in the temporary N-dimensional frequency distribution table 221 (S706). Specifically, the frequency distribution generation device 110 determines whether or not there is a column in the N-dimensional frequency distribution table 221 with the first element section number 501 = s and the second element section number 502 = t. If a column exists, 1 is added to the frequency of the corresponding column. If the column does not exist, a new column is added to the temporary N-dimensional frequency distribution table 221, and m is set as the first element section number 501 and n is set as the second element section number 502 to the added column. 1 is set as 503.
 続いて頻度分布生成装置110は、変数nに1を加算し(S707)、変数nが業務DBテーブル212のデータ項目の総数を超えているか否かを判定する(S708)。変数nが業務DBテーブル212のデータ項目の総数を超えている場合(S708:Yes)、S709からの処理を行う。変数nが業務DBテーブル212のデータ項目の総数を超えていない場合(S708:No)、S704からの処理を行う。 Subsequently, the frequency distribution generation device 110 adds 1 to the variable n (S707), and determines whether the variable n exceeds the total number of data items in the business DB table 212 (S708). When the variable n exceeds the total number of data items in the business DB table 212 (S708: Yes), the processing from S709 is performed. If the variable n does not exceed the total number of data items in the business DB table 212 (S708: No), the processing from S704 is performed.
 続いて頻度分布生成装置110は、変数mに1を加算し(S709)、変数mが業務DBテーブル212のデータ項目の総数を超えているかを判定する(S710)。変数mがデータ項目の総数を超えている場合(S710:Yes)、処理は終了する。変数mがデータ項目の総数を超えていない場合(S710:No)、S702からの処理を行う。 Subsequently, the frequency distribution generation device 110 adds 1 to the variable m (S709), and determines whether the variable m exceeds the total number of data items in the business DB table 212 (S710). If the variable m exceeds the total number of data items (S710: Yes), the process ends. When the variable m does not exceed the total number of data items (S710: No), the processing from S702 is performed.
 以上に説明した処理により、業務DB101に格納されているデータがN次元頻度分布に変換される。 Through the processing described above, the data stored in the business DB 101 is converted into an N-dimensional frequency distribution.
<頻度推定処理>
 続いて、以上のようにして生成されたN次元頻度分布に基づき、頻度推定装置120がデータパターンの出現頻度を推定する処理(以下、頻度推定処理S800とも称する。)について説明する。
<Frequency estimation processing>
Next, processing (hereinafter also referred to as frequency estimation processing S800) in which the frequency estimation device 120 estimates the appearance frequency of the data pattern based on the N-dimensional frequency distribution generated as described above will be described.
 図8は、頻度推定処理S800を説明するフローチャートである。頻度推定装置120は、頻度分布生成装置110が生成したN次元頻度分布テーブル221を利用して、予め設定されたデータパターンについて頻度推定処理S800を行うことにより、データパターンごとに出現頻度を推定する。 FIG. 8 is a flowchart for explaining the frequency estimation process S800. The frequency estimation apparatus 120 estimates the appearance frequency for each data pattern by performing the frequency estimation process S800 for a preset data pattern using the N-dimensional frequency distribution table 221 generated by the frequency distribution generation apparatus 110. .
 同図に示すように、まず頻度推定装置120は、予め設定されたデータパターンが登録されているテーブルである入力データパターンテーブル811を読み込む(S801)。 As shown in the figure, first, the frequency estimation device 120 reads an input data pattern table 811 which is a table in which preset data patterns are registered (S801).
 図9に入力データパターンテーブル811の一例を示している。同図に示おいて、パターンID901は、データパターンごとに一意に割り振られる識別子(以下、パターンIDとも称する。)である。符号902~904は、各データ項目に対応するデータ仕様テーブル211のDID401である。パターンID901と符号902~904のいずれかとが交差(クロス)する位置に記載されている数値は区間番号である。尚、本例では、データ仕様テーブル211において頻度分布適用有無403が「1」であるもののみを対象としている。頻度推定装置120は、入力データパターンテーブル811に格納されているデータパターンの夫々について出現頻度の推定を行う。 FIG. 9 shows an example of the input data pattern table 811. In the figure, a pattern ID 901 is an identifier (hereinafter also referred to as a pattern ID) that is uniquely assigned for each data pattern. Reference numerals 902 to 904 denote the DID 401 of the data specification table 211 corresponding to each data item. A numerical value described at a position where the pattern ID 901 and any of the reference numerals 902 to 904 intersect is a section number. In this example, only the data specification table 211 whose frequency distribution application presence / absence 403 is “1” is targeted. The frequency estimation device 120 estimates the appearance frequency for each of the data patterns stored in the input data pattern table 811.
 図8に戻り、続いて頻度推定装置120は、変数iに1を代入し(S802)、入力データパターンテーブル811からi番目のデータパターンを読み込む。 Referring back to FIG. 8, the frequency estimation apparatus 120 then substitutes 1 for the variable i (S802), and reads the i-th data pattern from the input data pattern table 811.
 続いて頻度推定装置120は、i番目のデータパターンについて出現頻度の推定を行い、頻度リスト821を出力する(S804)。頻度推定装置120は、頻度分布生成装置110が生成したN次元頻度分布テーブル221に基づき出現頻度を推定する。この処理(以下、データパターン頻度推定処理S804とも称する。)の詳細については後述する。 Subsequently, the frequency estimation device 120 estimates the appearance frequency for the i-th data pattern and outputs the frequency list 821 (S804). The frequency estimation device 120 estimates the appearance frequency based on the N-dimensional frequency distribution table 221 generated by the frequency distribution generation device 110. Details of this processing (hereinafter also referred to as data pattern frequency estimation processing S804) will be described later.
 図10に頻度リスト821の一例を示す。同図に示すように、頻度リスト821は、パターンID1001、頻度推定値1002、及び頻度上限値1003の3つの項目を有する一つ以上のレコードを含む。パターンID1001は、入力データパターンテーブル811におけるパターンIDである。頻度推定値1002は、そのデータパターンが業務DB101にどの程度の割合で含まれるかを示す値(確率)である。頻度上限値1003は、データパターンの業務DB101内での出現頻度の上限値(確率)である。 FIG. 10 shows an example of the frequency list 821. As shown in the figure, the frequency list 821 includes one or more records having three items: a pattern ID 1001, a frequency estimated value 1002, and a frequency upper limit value 1003. A pattern ID 1001 is a pattern ID in the input data pattern table 811. The frequency estimation value 1002 is a value (probability) indicating how much the data pattern is included in the business DB 101. The frequency upper limit value 1003 is an upper limit value (probability) of the appearance frequency of the data pattern in the business DB 101.
 図8に戻り、続いて頻度推定装置120は、変数iに1を加算し(S805)、変数iが、入力データパターンテーブルに811格納されているパターンの総数を超えているか否かを判定する(S806)。変数iがパターンの総数を超えている場合(S806:Yes)、処理はS807に進む。変数iがパターンの総数を超えていない場合(S806:No)、処理はS803に戻る。 Returning to FIG. 8, the frequency estimation apparatus 120 adds 1 to the variable i (S805), and determines whether or not the variable i exceeds the total number of patterns stored in the input data pattern table 811. (S806). If the variable i exceeds the total number of patterns (S806: Yes), the process proceeds to S807. If the variable i does not exceed the total number of patterns (S806: No), the process returns to S803.
 S807では、頻度推定装置120は、除外データパターンテーブル822、及びデータパターン頻度テーブル823を出力する。この処理(以下、データパターン頻度出力処理S807とも称する。)の詳細については後述する。 In S807, the frequency estimation apparatus 120 outputs an excluded data pattern table 822 and a data pattern frequency table 823. Details of this process (hereinafter also referred to as data pattern frequency output process S807) will be described later.
 図11にS807において頻度推定装置120が出力する除外データパターンテーブル822の一例を示す。除外データパターンテーブル822は、推定した出現頻度が予め設定した閾値T%以下と判定されたデータパターンを格納するテーブルである。同図に示すように、除外データパターンテーブル822は、パターンID1101、頻度推定値1102、頻度上限値1103、及び各データ項目の区間番号(符号1104~1106)の各項目を有する複数のレコードで構成されている。除外データパターンテーブル822に格納されている情報は、入力データパターンテーブル811及び頻度リスト821における情報と共通である。 FIG. 11 shows an example of the excluded data pattern table 822 output by the frequency estimation device 120 in S807. The excluded data pattern table 822 is a table that stores data patterns determined to have an estimated appearance frequency equal to or less than a preset threshold value T%. As shown in the figure, the excluded data pattern table 822 is composed of a plurality of records each having a pattern ID 1101, a frequency estimated value 1102, a frequency upper limit value 1103, and section numbers (reference numerals 1104 to 1106) of each data item. Has been. Information stored in the excluded data pattern table 822 is the same as the information in the input data pattern table 811 and the frequency list 821.
 図12にS807において頻度推定装置120が出力するデータパターン頻度テーブル823の一例を示す。データパターン頻度テーブル823は、パターンの頻度上限値1204が閾値T%を超えるもので構成されている。同図に示すように、データパターン頻度テーブル823は、パターンID1201、頻度推定値1202、累積頻度1223、頻度上限値1203、及び各データ項目の区間番号(符号1205~1207)の各項目を有する複数のレコードで構成されている。累積頻度1203以外の情報は、入力データパターンテーブル811及び頻度リスト821における情報と共通である。データパターン頻度テーブル823において、各データパターンは頻度推定値1202の降順にソートされており、あるデータパターンの累積頻度1203には、当該データパターンより上位のデータパターンの頻度推定値を合計した累積値が格納される。 FIG. 12 shows an example of the data pattern frequency table 823 output by the frequency estimation device 120 in S807. The data pattern frequency table 823 includes a pattern frequency upper limit 1204 that exceeds a threshold value T%. As shown in the figure, the data pattern frequency table 823 has a plurality of items each including a pattern ID 1201, a frequency estimated value 1202, a cumulative frequency 1223, a frequency upper limit value 1203, and a section number (reference numerals 1205 to 1207) of each data item. It is composed of records. Information other than the cumulative frequency 1203 is the same as the information in the input data pattern table 811 and the frequency list 821. In the data pattern frequency table 823, each data pattern is sorted in descending order of the frequency estimated value 1202, and the accumulated frequency 1203 of a certain data pattern has a cumulative value obtained by summing up the frequency estimated values of data patterns higher than the data pattern. Is stored.
 業務システム100についてソフトウェアのテストを行う際には、例えば、データパターン頻度テーブル823に格納されているデータパターンを出現頻度が多いものから順にテストケースとして採用する。これにより業務システム100において実際に取り得るデータパターンに対するカバレッジを上げつつテスト工程の効率化を図ることができるとともに、業務システム100の品質向上を図ることができる。またユーザ等は、累積頻度1203を参照することで、あるデータパターンまでテストを行った際、業務DB101に含まれている全データパターンのうち、どの程度の割合までカバーできているのかを把握することができる。尚、選択したデータパターンをテストケースとして利用する際は、区間番号を区間の代表値へ置き換える等の処理をした上でシステムへ入力する必要がある。具体的には、例えば、データ仕様テーブル211の境界値を参照し、区間番号が示す区間の最小値、最大値、中央値などを代表値として用いる。 When performing a software test on the business system 100, for example, the data patterns stored in the data pattern frequency table 823 are employed as test cases in descending order of appearance frequency. As a result, it is possible to improve the efficiency of the test process while increasing the coverage for the data patterns that can actually be taken in the business system 100, and to improve the quality of the business system 100. In addition, by referring to the accumulated frequency 1203, the user or the like grasps to what extent the entire data pattern included in the business DB 101 can be covered when a test is performed up to a certain data pattern. be able to. When the selected data pattern is used as a test case, it is necessary to perform processing such as replacing the section number with a representative value of the section and then inputting it to the system. Specifically, for example, the boundary value in the data specification table 211 is referred to, and the minimum value, maximum value, median value, and the like of the section indicated by the section number are used as representative values.
 続いて、図13に示すフローチャートとともに、図8のデータパターン頻度推定処理S804の詳細について説明する。 Next, details of the data pattern frequency estimation process S804 of FIG. 8 will be described with reference to the flowchart of FIG.
 まず頻度推定装置120は、第1要素のDID401を示す変数m及び出現頻度の推定値の更新回数を示す変数jに1を代入する(S1301)。 First, the frequency estimation device 120 assigns 1 to a variable m indicating the DID 401 of the first element and a variable j indicating the number of updates of the appearance frequency estimation value (S1301).
 続いて頻度推定装置120は、第2要素のDID401を示す変数nにm+1を代入する(S1302)。変数nに「m+1」を代入しているのは、考慮すべき対象が2変数m,nの組み合わせであり、m<nの場合のみを考えれば十分だからである。 Subsequently, the frequency estimation apparatus 120 substitutes m + 1 for a variable n indicating the DID 401 of the second element (S1302). The reason why “m + 1” is substituted for the variable n is that the object to be considered is a combination of two variables m and n, and it is sufficient to consider only when m <n.
 続いて頻度推定装置120は、入力データテーブル811において、パターンID1001が変数iと等しいデータパターンのDID401がmである区間番号を変数B(m)に代入する。同様に、パターンID1001が変数iと等しいデータパターンのDID401がnである区間番号を変数B(n)に代入する(S1303)。 Subsequently, in the input data table 811, the frequency estimation apparatus 120 substitutes the section number in which the DID 401 of the data pattern whose pattern ID 1001 is equal to the variable i is m in the variable B (m). Similarly, the section number in which the DID 401 of the data pattern whose pattern ID 1001 is equal to the variable i is n is substituted into the variable B (n) (S1303).
 続いて頻度推定装置120は、N次元頻度分布テーブル221から、第1要素の区間番号がB(m)、第2要素の区間番号がB(n)に対応するデータパターンの頻度503を読み込む(以下、これを変数Fm,n(B(m),B(n))と表す。)(S1304)。 Subsequently, the frequency estimation apparatus 120 reads the frequency 503 of the data pattern corresponding to the section number of the first element B (m) and the section number of the second element B (n) from the N-dimensional frequency distribution table 221 ( Hereinafter, this is expressed as a variable Fm, n (B (m), B (n)).) (S1304).
 続いて頻度推定装置120は、i番目のデータパターンの出現頻度の推定値を更新する(S1305)。ここで頻度推定装置120は、j-1回目の更新における推定値E(j-1)とS1304で読み込んだ変数Fm,n(B(m),B(n))とに基づき、次式から出現頻度の推定値E(j)を求める。
 E(j)=((j-1)×E(j-1)+e(j))/j
 ここでe(j)はj回目に得られる暫定的な頻度の推定値であり、E(j)はe(1)~e(j)の平均値である。e(j)はB(m)とB(n)に相関があると仮定しそれ以外のデータ項目は互いに独立であると仮定したときに得られる暫定的な推定値であり、次式で表される。
 e(j)=Pall×Fm,n(B(m)、B(n))/(Fm(B(m))×Fn(B(n)))×S
 ここでSは業務DBテーブル212に含まれるデータの総数であり、N次元頻度分布テーブル221の頻度503の総和と一致する。またFm(B(m))はデータ項目mに対する区間番号B(m)の出現回数であり、任意のDIDnに対して次式で表される。
 Fm(B(m))=Fmn(B(m),1)+・・・+Fmn(B(m),M)
 またPallは全てのデータ項目の区間番号が互いに独立であると仮定したときに得られるi番目のデータパターンの頻度であり、次式で表される。
 Pall=(F1(B(1))/S)×(F2(B(2))/S)×・・・×(FM(B(M))/S)
Subsequently, the frequency estimation apparatus 120 updates the estimated value of the appearance frequency of the i-th data pattern (S1305). Here, the frequency estimation device 120 uses the following equation based on the estimated value E (j−1) in the j−1th update and the variables Fm, n (B (m), B (n)) read in S1304. An estimated value E (j) of the appearance frequency is obtained.
E (j) = ((j−1) × E (j−1) + e (j)) / j
Here, e (j) is a provisional frequency estimation value obtained for the jth time, and E (j) is an average value of e (1) to e (j). e (j) is a tentative estimate obtained when it is assumed that B (m) and B (n) are correlated and the other data items are independent of each other. Is done.
e (j) = Pall * Fm, n (B (m), B (n)) / (Fm (B (m)) * Fn (B (n))) * S
Here, S is the total number of data included in the business DB table 212, and matches the sum of the frequencies 503 in the N-dimensional frequency distribution table 221. Fm (B (m)) is the number of appearances of the section number B (m) for the data item m, and is expressed by the following equation for an arbitrary DIDn.
Fm (B (m)) = Fmn (B (m), 1) +... + Fmn (B (m), M)
Pall is the frequency of the i-th data pattern obtained when it is assumed that the section numbers of all data items are independent from each other, and is expressed by the following equation.
Pall = (F1 (B (1)) / S) × (F2 (B (2)) / S) ×... × (FM (B (M)) / S)
 続いて頻度推定装置120は、i番目のデータパターンの頻度の上限値U(j)を更新する(S1306)。上限値U(j)は、j-1回目の更新における上限値をU(j-1)とS1304で読み込んだ変数Fm,n(B(m),B(n))とに基づき、次式から求める。
 U(j)=MIN(U(j-1),Fm,n(B(m),B(n))/S)
Subsequently, the frequency estimation apparatus 120 updates the upper limit value U (j) of the frequency of the i-th data pattern (S1306). The upper limit value U (j) is based on U (j-1) and the variable Fm, n (B (m), B (n)) read in S1304 as the upper limit value in the j-1st update. Ask from.
U (j) = MIN (U (j-1), Fm, n (B (m), B (n)) / S)
 ここでMIN()は与えられた引数の最小値を返す関数であり、従って上限値U(j)には、全てのm,nに対するFm,n(B(m),B(n))/Sの最小値が格納される。Fm,n(B(m),B(n))/Sは、データ項目m,nのみを限定したときの、データパターンの出現頻度を表している。よってm,n以外のデータ項目についても指定したデータパターンの出現頻度はFm,n(B(m),B(n))/S以下となる。このため、全てのm,nに対するFm,n(B(m),B(n))/Sの最小値をU(j)としても実際のデータパターンの出現頻度はU(j)以下となることが保証される。 Here, MIN () is a function that returns the minimum value of a given argument, and therefore the upper limit value U (j) has Fm, n (B (m), B (n)) / for all m and n. The minimum value of S is stored. Fm, n (B (m), B (n)) / S represents the appearance frequency of the data pattern when only the data items m and n are limited. Therefore, the frequency of appearance of the specified data pattern for data items other than m and n is Fm, n (B (m), B (n)) / S or less. For this reason, even if the minimum value of Fm, n (B (m), B (n)) / S for all m, n is U (j), the actual data pattern appearance frequency is U (j) or less. It is guaranteed.
 具体例を示す。例えば、データ項目として「年齢」、「年金加入期間」、「平均月収」があり、このうちの2項目を組み合わせたデータパターンについての出現頻度が、「年齢=30歳、年金加入期間=10年」について10%、「年齢=30歳、平均月収=30万円」について0.5%、「年期加入期間=10年、平均月収=30万円」について1.0%と推定される場合、「年齢=30歳、年期加入期間=10年、平均月収=30万円」という3項目を組み合わせたデータパターンの出現頻度の上限値は、上記のうちの最小値である0.5%となる。 A specific example is shown. For example, data items include “age”, “pension enrollment period”, and “average monthly income”, and the appearance frequency for a data pattern combining two of these items is “age = 30 years old, pension enrollment period = 10 years” ”Is estimated to be 10%,“ age = 30 years old, average monthly income = 300,000 yen ”0.5%, and“ annual membership period = 10 years, average monthly income = 300,000 yen ”1.0% The upper limit of the appearance frequency of the data pattern combining the three items “age = 30 years, annual subscription period = 10 years, average monthly income = 300,000 yen” is the minimum value of 0.5% It becomes.
 続いて頻度推定装置120は、変数nに1を加算し、変数jに1を加算し(S1307)、変数nが要素の総数Mを超えているか否かを判定する(S1308)。変数nがMを超えている場合(S1308:Yes)、処理はS1309に進む。変数nがM以下の場合(S1308:No)、処理はS1303に戻る。 Subsequently, the frequency estimation apparatus 120 adds 1 to the variable n, adds 1 to the variable j (S1307), and determines whether the variable n exceeds the total number M of elements (S1308). If the variable n exceeds M (S1308: Yes), the process proceeds to S1309. If the variable n is M or less (S1308: No), the process returns to S1303.
 続いて頻度推定装置120は、変数mに1を加算し(S1309)、変数mがM-1を超えているか否かを判定する(S1310)。変数mがM-1を超えている場合(S1309:Yes)、処理はS1311に進む。変数mがM-1以下の場合(S1309:No)、処理はS1302に戻る。 Subsequently, the frequency estimation apparatus 120 adds 1 to the variable m (S1309), and determines whether or not the variable m exceeds M−1 (S1310). If the variable m exceeds M−1 (S1309: Yes), the process proceeds to S1311. If the variable m is equal to or less than M−1 (S1309: No), the process returns to S1302.
 S1311では、頻度推定装置120は、S1305及びS1306で繰り返し更新した頻度の推定値E(j)及び頻度の上限値U(j)を頻度リスト821へ出力する。具体的には、頻度推定装置120は、データパターンのパターンID1001と、それに対応するE(j)を頻度推定値1002に、U(j)を頻度上限値1003に、それぞれ書き出す。図8のデータパターン頻度推定処理S804は以上のようにして行われる。 In S1311, the frequency estimation apparatus 120 outputs the frequency estimation value E (j) and the frequency upper limit U (j) updated repeatedly in S1305 and S1306 to the frequency list 821. Specifically, the frequency estimation apparatus 120 writes the pattern ID 1001 of the data pattern and the corresponding E (j) as the frequency estimation value 1002 and U (j) as the frequency upper limit value 1003, respectively. The data pattern frequency estimation process S804 in FIG. 8 is performed as described above.
 続いて、図14に示すフローチャートとともに、図8のデータパターン頻度出力処理S807の詳細について説明する。 Next, details of the data pattern frequency output process S807 of FIG. 8 will be described with reference to the flowchart of FIG.
 同図に示すように、まず頻度推定装置120は、変数iに1を代入し(S1401)、頻度リスト821からパターンIDがiと一致するデータパターンを読み込む(S1402)。 As shown in the figure, the frequency estimation apparatus 120 first substitutes 1 for a variable i (S1401), and reads a data pattern whose pattern ID matches i from the frequency list 821 (S1402).
 続いて頻度推定装置120は、読み込んだデータパターン(以下、当該データパターンと称する。)の出現頻度と予め設定された出現頻度の閾値T(0%≦T<100%)とを比較する(S1403)。当該データパターンの出現頻度が閾値Tを超えている場合(S1403:Yes)、頻度推定装置120は、当該データパターンをデータパターン頻度テーブル823に追加する(S1405)。尚、このとき、頻度推定装置120は、当該データパターンの頻度推定値1002及び頻度上限値1003を頻度リスト821から取得してデータパターン頻度テーブル823に設定する。また頻度推定装置120は、入力データパターンテーブル811から、パターンIDがiのレコードの区間番号902~904を取得してデータパターン頻度テーブル823に設定する。その後、処理はS1406に進む。 Subsequently, the frequency estimation device 120 compares the appearance frequency of the read data pattern (hereinafter referred to as the data pattern) with a preset appearance frequency threshold T (0% ≦ T <100%) (S1403). ). When the appearance frequency of the data pattern exceeds the threshold T (S1403: Yes), the frequency estimation apparatus 120 adds the data pattern to the data pattern frequency table 823 (S1405). At this time, the frequency estimation device 120 acquires the frequency estimated value 1002 and the frequency upper limit value 1003 of the data pattern from the frequency list 821 and sets them in the data pattern frequency table 823. Further, the frequency estimation apparatus 120 acquires the section numbers 902 to 904 of the records with the pattern ID i from the input data pattern table 811 and sets them in the data pattern frequency table 823. Thereafter, the process proceeds to S1406.
 一方、当該データパターンの出現頻度が閾値T以下である場合(S1403:No)、頻度推定装置120は、当該データパターンを除外データパターンテーブル822に追加する。このとき、頻度推定装置120は、当該データパターンの頻度推定値1002及び頻度上限値1003を頻度リスト821から取得して除外データパターンテーブル822に設定する。また頻度推定装置120は、入力データパターンテーブル811から、パターンIDがiのレコードの区間番号902~904を取得して除外データパターンテーブル822に設定する。その後、処理はS1406に進む。 On the other hand, when the appearance frequency of the data pattern is equal to or less than the threshold T (S1403: No), the frequency estimation device 120 adds the data pattern to the excluded data pattern table 822. At this time, the frequency estimation apparatus 120 acquires the frequency estimated value 1002 and the frequency upper limit value 1003 of the data pattern from the frequency list 821 and sets them in the excluded data pattern table 822. Further, the frequency estimation apparatus 120 acquires the section numbers 902 to 904 of the records with the pattern ID i from the input data pattern table 811 and sets them in the excluded data pattern table 822. Thereafter, the process proceeds to S1406.
 S1406では、頻度推定装置120は、変数iに1を加算し(S1406)、変数iが頻度リスト821に含まれているデータパターンの総数を超えているか否かを判定する(S1407)。変数iがデータパターンの総数を超えている場合(S1407:Yes)、処理はS1408に進む。変数iがデータパターンの総数以下である場合(S1407:No)、処理はS1402に戻る。 In S1406, the frequency estimation apparatus 120 adds 1 to the variable i (S1406), and determines whether the variable i exceeds the total number of data patterns included in the frequency list 821 (S1407). If the variable i exceeds the total number of data patterns (S1407: Yes), the process proceeds to S1408. If the variable i is equal to or less than the total number of data patterns (S1407: No), the process returns to S1402.
 S1408では、頻度推定装置120は、データパターン頻度テーブル823に含まれているデータパターンを、頻度推定値1002の降順にソートする。S1409では、頻度推定装置120は、最も頻度推定値が大きいデータパターンから各データパターンまでの頻度推定値1202の総和を求め、求めた値を各データパターンの累積頻度1203に代入する。図8のデータパターン頻度出力処理S807は以上のように行われる。 In S1408, the frequency estimation device 120 sorts the data patterns included in the data pattern frequency table 823 in descending order of the frequency estimated value 1002. In S1409, the frequency estimation apparatus 120 calculates the sum of the frequency estimation values 1202 from the data pattern having the largest frequency estimation value to each data pattern, and substitutes the calculated value into the cumulative frequency 1203 of each data pattern. The data pattern frequency output process S807 of FIG. 8 is performed as described above.
=第2実施形態=
 第2実施形態では、第1実施形態のように手動で設定したデータパターンを入力するのではなく、テスト対象のソフトウェアに入力され得る全てのデータパターンについて深さ優先探索を行い、探索の過程で取得される各データパターンの出現頻度をN次元頻度分布に基づき推定し、出現頻度が上位X件までのデータパターンを出力する。但し、取り得るデータパターンの数は膨大であるため、全てのパターンに対して出現頻度を推定することが困難である。そこで本実施形態では、頻度を推定するデータパターンを探索する過程で頻度が低いデータパターンについては枝刈りを行い、効率的に上位X件のデータパターンを抽出する。
= Second Embodiment =
In the second embodiment, instead of inputting a manually set data pattern as in the first embodiment, a depth-first search is performed for all data patterns that can be input to the test target software. The appearance frequency of each acquired data pattern is estimated based on the N-dimensional frequency distribution, and data patterns with the appearance frequency up to the top X are output. However, since the number of possible data patterns is enormous, it is difficult to estimate the appearance frequency for all patterns. Therefore, in the present embodiment, in the process of searching for the data pattern for estimating the frequency, the data pattern with low frequency is pruned, and the top X data patterns are efficiently extracted.
 図15は、第2実施形態として示す、データパターン頻度推定処理S1500を説明するフローチャートである。 FIG. 15 is a flowchart for explaining the data pattern frequency estimation process S1500 shown as the second embodiment.
 同図に示すように、まず頻度推定装置120は、変数iに初期値として1を代入する(S1501)。続いて頻度推定装置120は、i番目のデータパターンの出現頻度を推定する(S1502)。尚、この処理は、第1実施形態の図13のデータパターン頻度推定処理S804と同様の方法で行われる。頻度推定装置120は、i番目のデータパターンについて頻度推定値1002及び頻度上限値1003を求め、これらをパターンIDに対応づけて頻度リスト821に格納する。 As shown in the figure, the frequency estimation apparatus 120 first substitutes 1 as an initial value for a variable i (S1501). Subsequently, the frequency estimation device 120 estimates the appearance frequency of the i-th data pattern (S1502). This process is performed by the same method as the data pattern frequency estimation process S804 of FIG. 13 of the first embodiment. The frequency estimation device 120 calculates a frequency estimation value 1002 and a frequency upper limit value 1003 for the i-th data pattern, and stores them in the frequency list 821 in association with the pattern ID.
 続いて頻度推定装置120は、出現頻度を推定するデータパターンを探索し(S1503)、探索したデータパターンのパターンIDを変数iに代入する(S1504)。尚、本実施形態では入力データパターンテーブル811を用いないため、頻度推定装置120は、探索するデータパターンの夫々に固有のパターンIDを自動生成する。 Subsequently, the frequency estimation device 120 searches for a data pattern for estimating the appearance frequency (S1503), and substitutes the pattern ID of the searched data pattern for the variable i (S1504). In this embodiment, since the input data pattern table 811 is not used, the frequency estimation device 120 automatically generates a unique pattern ID for each data pattern to be searched.
 図16は頻度推定装置120がデータパターンの探索に際して参照するツリー構造1600の一例である。ここでは一例として、データ項目として「生年月日」、「加入期間」、「平均支払額」の3つが存在するものとする。 FIG. 16 shows an example of a tree structure 1600 that the frequency estimation device 120 refers to when searching for a data pattern. Here, as an example, it is assumed that there are three data items of “birth date”, “subscription period”, and “average payment amount”.
 まず頻度推定装置120は、最初のデータ項目である「生年月日」の区間番号ごとにノードを生成し、それをルート1600の直下に接続する。ここではノード1601とノード1607が追加されている。続いて頻度推定装置120は、次のデータ項目である「加入期間」の区間番号ごとにノード1602,1605生成し、それらをノード1601の下へ接続する。以下、同じ要領にて要素数を増加させながらツリーを下方に(先の枝に)繋げていく。 First, the frequency estimation device 120 generates a node for each section number of the first data item “birth date”, and connects it to a location directly under the route 1600. Here, a node 1601 and a node 1607 are added. Subsequently, the frequency estimation device 120 generates nodes 1602 and 1605 for each section number of “subscription period” which is the next data item, and connects them under the node 1601. Hereafter, the tree is connected downward (to the previous branch) while increasing the number of elements in the same manner.
 データパターンの探索において、頻度推定装置120は、上記ツリー構造を深さ優先探索により探索していく。即ちあるノードから探索を始めた場合、当該ノードの子ノードを優先して探索していき、深い階層のノードが全て探索し終わってから親ノードに戻り探索を継続する。図16の例では、頻度推定装置120は、ノード1601、ノード1602、ノード1603、ノード1604、・・・、ノード1605、ノード1606、・・・、ノード1607の順にツリー構造を探索する。 In the data pattern search, the frequency estimation device 120 searches the tree structure by a depth-first search. That is, when the search is started from a certain node, the child node of the node is preferentially searched, and after searching all the deeper nodes, the search returns to the parent node and the search is continued. In the example of FIG. 16, the frequency estimation device 120 searches the tree structure in the order of a node 1601, a node 1602, a node 1603, a node 1604,..., A node 1605, a node 1606,.
 但しこのような探索を全てのデータパターンに対して行うと計算量が膨大となり処理負荷が増大する。そこで本実施形態では、各ノードの出現頻度の推定値から確実に上位X件に入らないと想定されるノードについては探索しないようにすることで、探索の効率化を図っている。 However, if such a search is performed on all data patterns, the amount of calculation becomes enormous and the processing load increases. Therefore, in the present embodiment, the search efficiency is improved by not searching for nodes that are assumed not to be in the top X from the estimated frequency of appearance of each node.
 ここで子ノードに該当するデータパターンは、親ノードに該当するデータパターンの部分集合となるため、子ノードの頻度推定値及び頻度上限値は、親ノードの頻度推定値及び頻度上限値よりも小さいことが保証される。そこでこの性質を利用し、探索の過程で出現頻度を推定したデータパターンのうち、出現頻度の推定値がX番目となるデータパターンの出現頻度の推定値と、探索対象の親ノードの出現頻度の推定値とを比較し、親ノードの出現頻度の推定値の方が小さい場合は子ノードの探索を行わない(打ち切る)ようにする。 Here, since the data pattern corresponding to the child node is a subset of the data pattern corresponding to the parent node, the frequency estimate value and the frequency upper limit value of the child node are smaller than the frequency estimate value and the frequency upper limit value of the parent node. It is guaranteed. Therefore, using this property, out of the data patterns whose appearance frequency is estimated during the search process, the estimated frequency of the data pattern whose estimated frequency of occurrence is the Xth and the appearance frequency of the parent node to be searched The estimated value is compared, and if the estimated value of the appearance frequency of the parent node is smaller, the child node is not searched (canceled).
 図15に戻り、データパターンの探索が完了した後(S1505:YES)、頻度推定装置120は、データパターンの出現頻度を出力する(S1506)。この処理は、第1実施形態の図14のデータパターン頻度推定処理S807と同様の方法で行われる。尚、図14のS1408にてデータパターン頻度テーブル823を頻度推定値でソートした後、得られた全てのデータパターンの出現頻度を出力するのではなく、出現頻度の高いものから順にX件のみを出力するようにしてもよい。 15, after the search for the data pattern is completed (S1505: YES), the frequency estimation device 120 outputs the appearance frequency of the data pattern (S1506). This process is performed by the same method as the data pattern frequency estimation process S807 of FIG. 14 of the first embodiment. In addition, after sorting the data pattern frequency table 823 by the frequency estimation values in S1408 of FIG. 14, the appearance frequencies of all the obtained data patterns are not output, but only the X items in descending order of appearance frequency are output. You may make it output.
 以上により、業務DBテーブル212に含まれるデータから、出現頻度が上位X件となるデータパターンのリストが出力される。これにより、ユーザ等は第1実施形態のように事前に入力データパターンテーブル811を用意することなく、出現頻度の高いデータパターンを知ることができ、ソフトウェアのテストを効率よく行うことができる。 As described above, a list of data patterns with the highest appearance frequency from the data included in the business DB table 212 is output. Accordingly, the user or the like can know a data pattern having a high appearance frequency without preparing the input data pattern table 811 in advance as in the first embodiment, and can efficiently perform a software test.
 ところで、本発明は上記した実施例に限定されるものではなく、他の様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 By the way, the present invention is not limited to the above-described embodiments, and includes various other modifications. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. In addition, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.
 また上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、SSD等の記録装置、または、ICカード、SDカード、DVD等の記録媒体に置くことができる。 In addition, each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, and an SSD, or a recording medium such as an IC card, an SD card, and a DVD.
 また制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 Also, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.
1 テストケース選択支援システム、100 業務システム、101 業務DB、110 頻度分布生成装置、111 データ仕様読込部、112 業務DB読込部、113 頻度分布生成部、114 頻度分布出力部、115 頻度分布DB、120 頻度推定装置、
121 データパターン読込部、122 頻度分布読込部、123 頻度推定部、124 データパターン探索部、125 データパターン頻度出力部、S300 N次元頻度分布生成処理、211 データ仕様テーブル、401 DID、402 データ項目名、403 頻度分布適用有無、221 N次元頻度分布テーブル、212 業務DBテーブル
S305 N次元頻度分布反映処理、S800 頻度推定処理、821 頻度リスト、822 除外データパターンテーブル、823 データパターン頻度テーブル
DESCRIPTION OF SYMBOLS 1 Test case selection support system, 100 business system, 101 business DB, 110 Frequency distribution generation apparatus, 111 Data specification reading part, 112 Business DB reading part, 113 Frequency distribution generation part, 114 Frequency distribution output part, 115 Frequency distribution DB, 120 frequency estimation device,
121 Data pattern reading unit, 122 Frequency distribution reading unit, 123 Frequency estimation unit, 124 Data pattern search unit, 125 Data pattern frequency output unit, S300 N-dimensional frequency distribution generation process, 211 Data specification table, 401 DID, 402 Data item name , 403 Frequency distribution application presence / absence, 221 N-dimensional frequency distribution table, 212 business DB table S305 N-dimensional frequency distribution reflection processing, S800 frequency estimation processing, 821 frequency list, 822 exclusion data pattern table, 823 data pattern frequency table

Claims (10)

  1.  テストケースの選択を支援する情報処理システムであって、テスト対象のソフトウェアに入力されるデータに基づきN個の要素が取り得る値の組み合わせについての出現頻度の分布であるN次元頻度分布を生成し、前記N次元頻度分布に基づき所定のデータパターンの出現頻度を推定する情報処理システム。 An information processing system that supports selection of a test case, and generates an N-dimensional frequency distribution that is a distribution of appearance frequencies for combinations of values that can be taken by N elements based on data input to software to be tested An information processing system for estimating an appearance frequency of a predetermined data pattern based on the N-dimensional frequency distribution.
  2.  請求項1に記載の情報処理システムであって、入力される複数の前記データパターンの夫々について前記N次元頻度分布に基づき前記出現頻度を推定し、前記データパターンの夫々について推定した前記出現頻度を記載したリストを出力する情報処理システム。 The information processing system according to claim 1, wherein the appearance frequency is estimated based on the N-dimensional frequency distribution for each of the plurality of input data patterns, and the appearance frequency estimated for each of the data patterns is calculated. An information processing system that outputs a written list.
  3.  請求項2に記載の情報処理システムであって、前記データパターンを夫々について推定した出現頻度の順にソートし、かつ、前記出現頻度の高い方から累積した前記出現頻度の累積値を前記データパターンの夫々に併記して出力する情報処理システム。 The information processing system according to claim 2, wherein the data patterns are sorted in the order of appearance frequencies estimated for each, and the cumulative values of the appearance frequencies accumulated from the higher appearance frequencies are calculated. An information processing system that outputs each of them together.
  4.  請求項2に記載の情報処理システムであって、推定した出現頻度が所定値を超える前記データパターンを示す情報、または推定した出現頻度が前記所定値以下である前記データパターンを示す情報を出力する情報処理システム。 The information processing system according to claim 2, wherein information indicating the data pattern whose estimated appearance frequency exceeds a predetermined value or information indicating the data pattern whose estimated appearance frequency is equal to or less than the predetermined value is output. Information processing system.
  5.  請求項1に記載の情報処理システムであって、前記N次元頻度分布から、前記データパターンに含まれている前記N個の要素が取り得る値の組み合わせの夫々について推定した出現頻度を取得し、取得した前記出現頻度のうちの最小値を前記データパターンの出現頻度の上限値として出力する情報処理システム。 The information processing system according to claim 1, wherein an appearance frequency estimated for each of combinations of values that can be taken by the N elements included in the data pattern is acquired from the N-dimensional frequency distribution, An information processing system that outputs a minimum value of the obtained appearance frequencies as an upper limit value of the appearance frequency of the data pattern.
  6.  請求項1に記載の情報処理システムであって、前記テスト対象のソフトウェアに入力され得るデータパターンについて深さ優先探索を行い、前記探索の過程で取得される各データパターンの出現頻度を前記N次元頻度分布に基づき推定し、推定した前記出現頻度の高い方から所定数のデータパターンを選択して出力する情報処理システム。 The information processing system according to claim 1, wherein a depth-first search is performed on a data pattern that can be input to the test target software, and an appearance frequency of each data pattern acquired in the search process is determined as the N-dimensional. An information processing system that estimates based on a frequency distribution, and selects and outputs a predetermined number of data patterns from the estimated higher appearance frequency.
  7.  請求項6に記載の情報処理システムであって、
     前記テスト対象のソフトウェアに入力され得る全てのデータパターンについて深さ優先探索を行い、前記探索の過程で取得される各データパターンの出現頻度を前記N次元頻度分布に基づき推定し、
     ある前記データパターンについて推定した出現頻度と、それまでに探索され出現頻度を推定済の他の前記データパターンのうちX番目に出現頻度の高いデータパターンの出現頻度と、を比較し、
     前記あるデータパターンについて推定した出現頻度が前記X番目に出現頻度の高いデータパターンの出現頻度よりも低い場合は当該データパターンから先の枝についての前記深さ優先探索を打ち切る、
     情報処理システム。
    The information processing system according to claim 6,
    A depth-first search is performed for all data patterns that can be input to the test target software, and the appearance frequency of each data pattern obtained in the search process is estimated based on the N-dimensional frequency distribution,
    Comparing the appearance frequency estimated for a certain data pattern with the appearance frequency of the data pattern having the highest appearance frequency among the other data patterns that have been searched and estimated for the appearance frequency so far,
    If the appearance frequency estimated for the certain data pattern is lower than the appearance frequency of the X-th most frequently occurring data pattern, the depth priority search for the previous branch from the data pattern is terminated,
    Information processing system.
  8.  請求項1に記載の情報処理システムであって、テスト対象のソフトウェアに入力される前記データは、前記ソフトウェアがアクセスするデータベースに登録されているデータである情報処理システム。 2. The information processing system according to claim 1, wherein the data input to the test target software is data registered in a database accessed by the software.
  9.  請求項1に記載の情報処理システムであって、前記要素が取り得る値は、前記要素が取り得る区間を分割して得られる複数の区間のいずれかに対応している情報処理システム。 2. The information processing system according to claim 1, wherein the value that the element can take corresponds to any of a plurality of sections obtained by dividing a section that the element can take.
  10.  情報処理システムの制御方法であって、
     情報処理システムに、
     テスト対象のソフトウェアに入力されるデータに基づきN個の要素が取り得る値の組み合わせについての出現頻度の分布であるN次元頻度分布を生成するステップと、
     前記N次元頻度分布に基づき所定のデータパターンの出現頻度を推定するステップと、
     を実行させる情報処理システムの制御方法。
    A control method for an information processing system,
    In the information processing system,
    Generating an N-dimensional frequency distribution that is a distribution of appearance frequencies for combinations of values that can be taken by N elements based on data input to software to be tested;
    Estimating an appearance frequency of a predetermined data pattern based on the N-dimensional frequency distribution;
    Of controlling an information processing system for executing the process.
PCT/JP2014/060484 2014-04-11 2014-04-11 Information processing system, which assists with selection of test case, and control method for said information processing system WO2015155881A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/060484 WO2015155881A1 (en) 2014-04-11 2014-04-11 Information processing system, which assists with selection of test case, and control method for said information processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/060484 WO2015155881A1 (en) 2014-04-11 2014-04-11 Information processing system, which assists with selection of test case, and control method for said information processing system

Publications (1)

Publication Number Publication Date
WO2015155881A1 true WO2015155881A1 (en) 2015-10-15

Family

ID=54287482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/060484 WO2015155881A1 (en) 2014-04-11 2014-04-11 Information processing system, which assists with selection of test case, and control method for said information processing system

Country Status (1)

Country Link
WO (1) WO2015155881A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006227958A (en) * 2005-02-18 2006-08-31 Nomura Research Institute Ltd Test data generation system and method
US20120324289A1 (en) * 2011-06-15 2012-12-20 Ian Clive Funnell Method and apparatus for testing data warehouses
JP2014026458A (en) * 2012-07-26 2014-02-06 Toshiba Corp Test case generation support device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006227958A (en) * 2005-02-18 2006-08-31 Nomura Research Institute Ltd Test data generation system and method
US20120324289A1 (en) * 2011-06-15 2012-12-20 Ian Clive Funnell Method and apparatus for testing data warehouses
JP2014026458A (en) * 2012-07-26 2014-02-06 Toshiba Corp Test case generation support device

Similar Documents

Publication Publication Date Title
US11106626B2 (en) Managing changes to one or more files via linked mapping records
US7346600B2 (en) Data analyzer
US10698800B2 (en) Indicating a readiness of a change for implementation into a computer program
JP2016004525A (en) Data analysis system and data analysis method
WO2016076906A1 (en) Testing insecure computing environments using random data sets generated from characterizations of real data sets
US20070233532A1 (en) Business process analysis apparatus
KR101975272B1 (en) System and method for recommending component reuse based on collaboration dependency
US20150248440A1 (en) Method for reconfiguration of database, recording medium, and reconfiguration device
KR20190118618A (en) Information processing apparatus, information processing method and recording medium
WO2015155881A1 (en) Information processing system, which assists with selection of test case, and control method for said information processing system
JP2018088087A (en) Data analyzer, data analysis method and data analysis program
US11328024B2 (en) Data analysis device and data analysis method
CN114881521A (en) Service evaluation method, device, electronic equipment and storage medium
JP6677624B2 (en) Analysis apparatus, analysis method, and analysis program
JP6869082B2 (en) Computer for selecting test cases and test case selection method
US10389593B2 (en) Refining of applicability rules of management activities according to missing fulfilments thereof
Ribeiro et al. RS4PD: A Tool for Recommending Control-Flow Algorithms.
KR102383820B1 (en) System and method for recommending property
CN109583907A (en) A kind of checking method of electronic invoice, device, medium and electronic equipment
JP6563549B1 (en) Data trend analysis method, data trend analysis system, and narrowing and restoring device
US20230410488A1 (en) Predictor creation device and predictor creation method
JP7010383B2 (en) Judgment method and judgment program
JP6626804B2 (en) Computer, selection method and selection program
US20230206075A1 (en) Method and apparatus for distributing network layers in neural network model
KR101546365B1 (en) Method and system for measuring innovation level of product using part-whole relation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14889128

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14889128

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP