HK1242445A1

HK1242445A1 - Data-driven testing framework

Info

Publication number: HK1242445A1
Application number: HK18101858.9A
Authority: HK
Inventors: P．普林茨; M.A.伊斯曼
Original assignee: 起元科技有限公司
Priority date: 2014-09-08
Filing date: 2015-09-04
Publication date: 2018-06-22

Description

Data driven test framework

Cross Reference to Related Applications

This application claims priority to U.S. application serial No. 62/047,256 filed on 8/9/2014.

Technical Field

The present disclosure relates to quality control, and more particularly, to an apparatus and method for identifying flaws or defects in a software application.

Background

Data processors need to be reconfigured in order to convert them from general-purpose computers to special-purpose machines that perform certain tasks. The resulting reconfiguration improves general-purpose computers by providing them with the ability to do something they could not do in the past. This reconfiguration is typically performed by causing a general purpose computer to execute some proprietary software. Such specialized software is commonly referred to as an "application" or "app".

For large projects, the applications to be tested are designed and implemented by a team of engineers. The application is then provided to a quality assurance team. The quality assurance team is typically separate from the design team. The quality assurance team continues to look for flaws or defects in the application.

The process for testing applications can be very difficult. This difficulty can be caused by a variety of reasons. One reason is that the quality assurance team essentially attempts to prove a negative argument that there is no shortcoming or deficiency in the software being tested. It is often not economical to run a large number of tests to cover every possible scenario. It is therefore desirable to select the test data appropriately.

Another difficulty in testing applications is that the environment in which the test is performed may be subject to variation. The environment typically includes software that is being executed and data that is to be operated on by the application. In the case where the application being tested interacts with other software being executed, it is important to know what the other software being executed is. It is important to have the correct data at present, because the characteristics of the application being tested depend to a large extent on the data provided to the application. For example, an application may require certain data from a database. In such a case, the test application needs to know that the database has the correct data. Quality assurance teams therefore typically take a number of measures to control the environment.

An additional difficulty that arises in testing applications is establishing the integrity of the results. In some cases, it may be difficult to know what results should be considered "correct" or "incorrect" for a given input set of input data that is processed under a particular environment.

Since testing is a major part of the software development lifecycle, it is useful to provide a more efficient method of testing.

Disclosure of Invention

In one aspect, the present disclosure describes an apparatus for testing an application. Such an apparatus includes a data processor having a memory and a processor operatively coupled to the memory. The data processor has been configured to implement a data driven testing framework that includes a data design module, a computing environment manager, and a results analysis module. The data design module is configured to create designed test data based at least in part on an application under test. Also, the computing environment manager is configured to control a computing environment in which the application operates on the designed test data. Finally, the results analysis module is configured to compare the designed test data operated on by the application to expected outputs.

In some embodiments, the data design module is configured to extract a subset of the production data. The subset is selected to achieve a specified code coverage. The designed test data includes the subset of the production data.

In other embodiments, the data design module includes a data refiner that generates refined data from the production data.

Also included within the scope of the present invention is an embodiment wherein the data design module is configured to extract a subset of the production data and enhance the subset with additional data, thereby generating enhanced data. The additional data is selected to achieve a specified code coverage and the designed test data includes the enhanced data.

In some embodiments, the data design module includes a data refiner and a data enhancer for receiving refined data from the data refiner and enhancing the refined data.

Other embodiments include those where the data design module is configured to generate data based at least in part on the application under test. The generated data is selected to achieve a specified code coverage, and the designed test data includes the generated data.

Further embodiments include those in which the data design module further includes a positive data maker for generating positive data, in which the data design module is configured to generate data based at least in part on the application under test and in which the data is not present in the production data, and in which the data design module further includes a negative data maker for generating negative data.

In some embodiments, the data design module includes means for generating designed test data.

Still other embodiments include those in which the data design module includes an integrity checker for determining referential integrity of the designed test data; and embodiments wherein the data design module is further configured to detect errors in referential integrity.

Embodiments also include embodiments in which the data design module includes a re-citation device for correcting for lack of referential integrity in the data prior to outputting the data as designed test data; and embodiments wherein the data design module is further configured to correct for the lack of referential integrity in the data.

Further examples are as follows: the data design module comprises a checking unit, wherein the checking unit is used for receiving the designed test data and enabling a user to check the designed test data or perform side writing on the designed test data; the data testing module comprises a data checking unit, and the data checking unit is used for receiving the designed testing data and enabling a user to check the designed testing data; the data design module comprises a side writer which is used for receiving the designed test data and enabling a user to perform side writing on the designed test data; the data design module is also configured to enable a user to perform side writing on the designed test data; and the data design module is further configured to enable a user to view the designed test data.

In some embodiments, the data design module includes a plurality of ways for generating the designed test data. In these embodiments, the choice of how to generate the designed-out test data depends at least in part on information related to the application under test. In other embodiments, the data design module includes a data enhancer, a data refiner, a negative data producer, and a positive data producer, each of which is configured to provide data forming the basis of the designed test data.

The present disclosure also includes the following embodiments: the data design module includes a logic extractor configured to identify logic functions to be tested in the application under test and provide those logic functions to a data refiner; and the data design module is further configured to identify logic functions to be tested in the application under test and to provide these logic functions to be used as a basis for obtaining the production data subset.

In a further embodiment, the computing environment manager comprises means for automatically creating and uninstalling a computing environment in which an application under test to be tested is tested.

The invention also includes the following embodiments: the computing environment manager includes a context switch. The environment conversion machine is configured to identify a source of the designed test data and is further configured to identify a target in which data obtained by the application under test processing the designed test data is placed.

In some embodiments, the environmental conversion machine is further configured to copy the designed test data from the first repository to the source. In these embodiments, the environment transformer is further configured to copy the designed test data from the target to a second repository.

Embodiments of the present invention include embodiments in which the computing environment manager includes an environmental backup machine and a recovery machine, as described below. In such an embodiment, the environment backup is configured to backup a first environment prior to converting the first environment to a second environment in which testing of the application to be tested is to occur. The restore configuration is configured to replace the second environment with the first environment.

In some embodiments, the computing environment manager includes an executor configured to cause an application under test to execute. Embodiments in which the executor is configured to automatically execute scripts when the application is caused to execute are included in these embodiments.

Still other embodiments include a computing environment manager that includes a context switch machine, a context backup machine, a restore machine, and an executor. In these embodiments, the environment conversion machine is configured to identify a source of the designed test data, the environment conversion machine is further configured to identify a target in which data obtained by the application under test processing the designed test data is placed, the environment backup machine is configured to backup a first environment prior to converting the first environment to a second environment in which testing of the application under test is to occur. The restore configuration is configured to replace the second environment with the first environment. And the executor is configured to start execution of the application to be tested.

In another aspect, the invention features a method of processing data in a computing system. Such methods include test applications. The test application in this case comprises receiving information representing the application under test through one of the input device and the port of the data processing system, and processing said received information. Processing the received information includes creating designed test data based at least in part on the information; controlling a computing environment in which the application operates the designed test data; and comparing the designed test data operated on by the application with expected output; and outputting a result indicative of the comparison.

In another aspect, the invention features a computing system for testing an application. Such a computing system includes means for remembering information and means for processing information. The means for processing information comprises means for data driven testing. The apparatus for data driven testing includes means for receiving information through an input device and/or a port of a data processing system. This information represents the application to be tested. The apparatus for data-driven testing further comprises: means for generating a set of designed test data based at least in part on the application under test; means for managing a computing environment in which an application operates designed test data generated by means for generating designed test data based at least in part on an application under test; and means for comparing the designed test data and the expected results with each other. The computing system also includes means for outputting an analysis of the results.

In another aspect, the invention features software stored in a non-transitory form on a computer-readable medium for managing testing of an application. Such software includes instructions for causing a computing system to perform certain process steps. These processing steps include: creating designed test data based at least in part on the application under test; controlling a computing environment in which the application operates the designed test data; comparing the designed test data operated on by the application to expected outputs; and outputting an analysis of the comparison.

The above and other features of the present invention will become apparent from the following detailed description and the accompanying drawings.

Drawings

FIG. 1 is a schematic diagram of the structural relationships between the components of a data driven test framework for an application tester;

fig. 2 shows a screen from a user interface.

FIG. 3 shows the screen after expansion of several of the blocks in FIG. 2;

FIG. 4 shows a diagram of a test using the input and output data files specified in FIG. 3;

FIG. 5 illustrates options for configuring an input data file;

FIG. 6 shows a block for specifying information for configuring a benchmark;

FIG. 7 illustrates options for record-by-record comparison;

FIG. 8 shows information about whether the test actually functions correctly;

FIG. 9 shows a summary of results based on a benchmarking application;

FIG. 10 shows the screen after expansion of the other frames in FIG. 2;

FIG. 11 illustrates an exemplary report on source level code coverage;

FIG. 12 is a schematic diagram of the structural relationships between the components of the data subset builder shown in the data driven test framework of FIG. 1;

FIG. 13 is a diagram of the structural relationships between components of the data maker shown in the data driven test framework of FIG. 1;

FIG. 14 is a diagram of the structural relationships between the components of the data enhancer shown in the data driven test framework of FIG. 1;

FIG. 15 is a diagram of the structural relationships between components of the environment manager of the data driven test framework of FIG. 1; and

FIG. 16 is an overview of the efficient testing process.

Detailed Description

More efficient testing can be achieved by ensuring that good data is available for testing, providing a method for repeatable testing of applications that automatically run in a known environment, collecting results that can be used to measure correctness or can evaluate the performance of the application under test, and having a method for evaluating these results.

FIG. 1 shows a data-driven test framework 10, which data-driven test framework 10 is installed in a test computer 12 for methodically efficient testing of applications 14 on the test computer 12. As used herein, a "test computer" is intended to include one or more processing systems that cooperate to perform an application testing process.

FIG. 2 illustrates a first screen of a user interface that the data driven framework 10 configures to provide for use in connection with the test application 14. The first picture has ten frames. When clicked, each of these boxes expands to display a deeper hierarchical box that provides the user with a multitude of choices, as shown in fig. 3. The blocks in fig. 1 and 2 are arranged in columns from left to right in a manner consistent with the order of tasks that are typically performed during the testing of the application 14.

The first column of fig. 2 shows a "Single Test" box, an "input data sets" box, and an "Output data sets" box.

As shown in expanded form in FIG. 3, a "single test" box enables a user to configure a particular test, specify a location where a test data set is to be saved, and identify any graph (graph), plan (plan), or script (script) for implementing custom logic with respect to the creation or uninstallation of a test environment, or perform analysis of test results.

The "input data set" and "output data set" boxes may enable a user to specify the location of the input and output data sets. In general, the output data sets are those that are changed by the application 14, while the input data sets are those that the application 14 uses to determine how to change the output data sets. For example, the application 14 may receive daily reports on revenue from each of a plurality of vehicles, and may update a database of accumulated revenue. In this case, the database to be updated is the "output" dataset and the daily revenue report is the "input" dataset.

The particular example shown in fig. 3 is relevant for testing the graph shown in fig. 4. The features of the graph include five input data sets and two output data sets. In fig. 3, the names of these datasets are listed in the appropriate "input dataset" and "output dataset" boxes.

FIG. 5 illustrates an input configuration box that is displayed when the perspective of the test framework is transferred to the "A-clients" database in FIG. 3. The input configuration box enables a user to identify the name and type of data set, examples of which include input files and input database tables. The input configuration box also enables the user to specify the state of the input data set. One example of a data set state is whether the data set is compressed. The input configuration box also enables the user to specify a path to the input data set and specify a record format for the data set. The test framework 10 shows similar boxes for each input and output data set specified.

When an application manipulates data, the data is typically modified in some way. Whether the application 14 modifies the data correctly provides an important basis for whether the application 14 operates correctly. However, it is often not possible to simply verify the modified data and declare it correct or incorrect. In general, the modified data needs to be compared to other data that is known to be correct. Data known to be correct is called "baseline".

The second column of the first screen contains a block on verifying whether the application 14 correctly performs its function. The features of this second column include a "benchmark compare" box and an "metrics" box.

The "metrics" box provides for enabling the user to specify what statistics should be displayed regarding application execution. This includes, for example, elapsed time, CPU time, and code coverage.

The "benchmark comparison" block enables a user to identify the benchmark data and perform certain operations on the benchmark data so that it serves as a benchmark. For example, it is possible that the reference data has some fields that are not present in the output data, or that some fields in the reference data do not match corresponding fields in the output data by nature. One example may be a date/time stamp, which has no effect but is different in both cases.

FIG. 6 shows a benchmark configuration box that is displayed when the perspective of the test frame is shifted to the "configuration benchmark …" option in the "benchmark comparison" box in FIG. 3. The benchmark configuration box provides the user with an opportunity to select the type of comparison. An example of a type of comparison may be a comparison between a series of files in a test data set library (repository) or MFS files. The benchmark configuration box also provides the user with an opportunity to specify where the benchmarks are located, whether the benchmarks are compressed, their record format, and to delete any benchmark fields or output fields prior to comparison.

As shown in fig. 3, there are two ways to perform the comparison between the reference and the output of the application 14. One way is to perform a per Record comparison, which is indicated in fig. 3 by the option "Configure per Record comparison (Configure per Record comparison)". Another way is to examine aggregated data rather than record-by-record comparisons. This is shown in FIG. 3 by the option "Configure statistical comparison … (Configure statistical comparison …)". One example of this may be to determine that the number of records in the data set corresponds to an expected number of records.

Fig. 6.5 shows the options available when the "configuration record-by-record compare" in the "base compare" box of fig. 3 is clicked. Available options include specifying the keys (keys) to be compared, and specifying the fields to be excluded when comparing. This is useful, for example, if the field includes a date/time stamp (the date/time stamps would not have been able to match since the same time would not have occurred a second time).

The third column includes a Single-Test-Run box for controlling the actual execution of the Test. The single test run box gives the option of saving historical results and running only the benchmark analysis.

The fourth and last columns contain options regarding the analysis of the results. A variety of reports may be generated. However, before actually verifying the test results, it is useful to determine whether the test actually functions correctly. In particular, it is useful to determine that all input and output files are properly specified, and that the steps and analysis results of the creation test (actually the run test) are successfully completed. This may be accomplished by selecting "View Event Detail for Run" in the "Single Test Results" box of the fourth column. This will result in a report as shown in figure 8. According to the report shown in fig. 8, other steps than one specific analysis step were performed well. The details of the step of running the error can be found by further click-through reporting.

After determining whether the test meets the user's satisfaction, the report comparing the test results to the baseline results may be examined. One such report is a summary of the comparison between the benchmark and the results produced by the test application 14, as shown in fig. 9. The report is obtained by clicking on the "View Summary" in the "base compare Results" box of FIG. 3. The report shows the number of baseline records and the number of records with deviations. It is clear that the test results in fig. 9 indicate that the application being tested produced many errors.

In addition to looking at the number of errors generated by the application and the locations where the errors occurred, reports on code coverage may also be viewed. Code coverage may be expressed in a number of ways, including image-level, component-level, and category-level coverage indicators. The available options can be viewed by clicking on "Code Coverage Results" in FIG. 3. This expands the box to display the options shown in fig. 10.

Fig. 11 shows an example of a report on a source level coverage indicator. The report is obtained by clicking on the View Source level Code Coverage indicator (View Source-LevelCoder Coverage Metrics) in the Code Coverage Results box (Code Coverage Results) of FIG. 10.

The data driven test framework 10 is shown to provide the test computer 12 with functionality that does not exist with the test computer 12 prior to installation of the data driven test framework 10. Thus, the illustrated data driven test framework 10 provides a significant technical improvement to the operation of a test computer 12 to which the data driven test framework 10 has been installed.

The application 14 to be tested may include obtaining object code by compiling source code. In some embodiments, the source code represents a directed acyclic graph. In other embodiments, the source code represents a floor plan.

In some embodiments, the source code represents a graph. The nodes of these graphs define processing components having ports connected by directed links to enable data flow between the components. In such a diagram, components may perform computations by receiving input data at an input port, processing the data, and providing resulting output at an output port.

In some embodiments, the source code represents a plan view. A planar graph is a directed acyclic graph in which nodes represent tasks and directed links define dependencies between tasks such that a downstream task cannot begin until an upstream task completes. In some embodiments, tasks are used to execute the graph.

The compiled source code associated with the application 14 may also include information or parameter sets representing "psets". The parameter set provides a list of parameters and values corresponding to each of these parameters. In some embodiments, the parameter set is used to provide parameters for customizing the image.

The application 14 is not limited to those applications in which the source code originating from the application represents a dataflow graph, a control flow graph, and a floor plan. Embodiments also include those applications in which the application 14 comprises object code obtained by appropriately compiling or translating source code written in any computer language, e.g., C code or Java code. Further description of performing such applications is provided in U.S. patent entitled "data record SELECTION" (issued to Isman et al, publication No. 2014-0222752, publication date 2014-8-7), the contents of which are incorporated herein by reference.

The application 14 typically implements rules that are triggered to execute by the values of one or more variables. These variables may be input variables corresponding to input data. Alternatively, they may be derived variables that depend on one or more input variables in the input data. In order to effectively test an application, it is sometimes necessary to provide test data sufficient to facilitate the execution of each logical rule in the application 14 in order to achieve complete code coverage in the application. It is also desirable to have the logic rules execute at least a corresponding minimum number of times, or, conversely, to have the logic rules execute no more than a maximum number of times.

The primary obstacle to effective testing is to obtain appropriate test data upon which the above requirements are satisfied when the data is manipulated by the application 14. The specific test data considered herein is data structured into a series of records, each record consisting of one or more fields.

One way to obtain test data is to use a sufficient amount of data extracted from the production system. In principle, this approach relies on testing large amounts of data that require that the likelihood of neglecting to test the characteristics of certain codes gradually approaches zero.

These data volumes have generally been enormous in the past. Each test cycle will take an unreasonably long time.

To overcome the foregoing obstacles, the illustrated data driven test framework 10 includes a data design module 16, the data design module 16 generating designed test data for use in testing the application 14. Examples of how designed test DATA may be generated are described in both the U.S. provisional application entitled "DATA generation (DATAGENERATION)" (applicant's Isman et al, application No. 61/917,727, application date 12/18/2013) and U.S. patent entitled "DATA RECORDS SELECTION" (applicant's Isman et al, publication No. 2014/0222752, application No. 13/827,558, application date 3/2013/14). The contents of both of the foregoing applications are incorporated herein by reference.

The data driven test framework 10 described herein is intended to take advantage of the discovery that: i.e. the total amount of data is not the only thing on which the code coverage depends. In fact, code coverage also depends on the nature of the data itself. Specifically, the code coverage depends on the logical concentration (logical concentration) or logical distribution (logical distribution) of the data. In practice, as long as the data actually used for testing is designed to be more logic intensive, the desired code coverage can generally be achieved using a significantly smaller amount of data.

As used herein, the term "code coverage" is a measure of the extent to which source code has been tested by a test process. It may be expressed as a ratio (usually expressed as a percentage) of a first value representing a quantitative measure of the total amount of code to be measured and a second value representing the quantitative measure actually to be measured. In some cases, the first and second variables represent tested features for the implemented features. In other cases, the first and second variables represent a line of source code that has been tested and a total line of source code. Clearly, the exact nature of the quantitative measure is not important to understanding the present invention.

The data driven test framework 10 is not required to achieve any specific code coverage, let alone 100% code coverage. Code coverage is a parameter set by the user based on engineering determinations. However, regardless of what code test coverage is selected by the user, the methods and apparatus described herein will reduce the amount of test data required to achieve this coverage, which is also achieved in a more reliable and stable manner than if the target code test coverage were achieved as much as possible by simply manipulating the total amount of production data.

Specifically, given a set of test data, a particular portion of the (exercise) code will be exercised. Different test data sets will typically exercise different portions of the code. For example, if the test data simply repeats the data record over and over again, it will only exercise a very limited subset of the code. Conversely, test data containing multiple records with various combinations of values will be more likely to exercise a larger subset of code.

The data design module 16 includes one or more components selected from a set of components. Each component generates designed test data using a particular method. The choice of what method to use and therefore what components are needed depends on the particular circumstances at hand.

The components of the data design module 16 include one or more of a data subset constructor 18(data subsetter), a data enhancer 20, a positive data maker 22(positive data maker), and a negative data maker 24(negative data maker). The data subset constructor 18 generates the designed test data by refining the existing data to increase its logical density. The data enhancer 20 generates designed test data by enhancing existing data. Both positive data maker 22 and negative data maker 24 create designed test data based on the requirements of the test.

There are also situations where there is no data category in the data for testing a particular logic in the application 14. However, this does not mean that the logic should not be tested.

If only the test data is relied upon to exercise the logic, the logic will never be tested. This is because there is no guarantee of a refined amount of existing data to produce data that can be used to test the logic. To accommodate these situations, certain embodiments of the data design module 16 include a negative data producer 24.

The negative data producer 24 provides data that is not normally present. This extends the code coverage of the test, as code that otherwise would not have had an opportunity to be tested can be exercised. The negative data producer 24 differs from the positive data producer 22 in that the negative data producer 24 provides data (referred to herein as "negative data") that is not generally present in (or in a sample of) a typical data set. Instead, the positive data producer 22 generates data (referred to herein as "positive data") that typically appears in a typical data set (or in a sample of a typical data set). Examples of negative data include field entries (field entries) that are not applicable to field formats, e.g., field entries that include characters in a character set that are not predefined for a field, or field entries that have values that exceed a predefined range of values for a field, or field entries that include the number of erroneous characters in one or more portions of a field entry. An example may be a social-security number containing letters or a month of birth with a value of 0. Other examples of negative data include those that are consistent with the field format but still corrupt the integrity of the reference. An example might be a customer number that is in the correct format but cannot identify any existing customer. The use of such negative test cases enhances code coverage. However, such negative data cannot be present in the production dataset (production dataset) and therefore typically requires manufacturing.

Because the designed test data has been generated, interactive debugging of the application 14 can be easily performed as the application 14 is developed. This works much more fruitfully than processing huge data sets, which take tens of minutes or even hours. For example, when designed test data is used in a local environment, the effects of the change rules can be seen on individual records in a business rules environment.

The data subset constructor 18 generates a set of designed test data that is small enough so that the developer of the application 14 can quickly see the effect of making changes to the application 14. But the set of test data is designed to be not only small, but also to have a high test logic density. Because of its high test logic density, the test data is designed to exercise the entire code of the application 14 without requiring an entire data set. Thus consuming the same computational resources while achieving high code coverage.

Fig. 12 shows details of the data subset constructor 18. The data subset builder 18 receives actual production data 26 (or any input data set for subset construction, logic specifications 28, and control variables 30. the logic extractor 31 identifies the logic functions to be tested and provides them to the data refiner 32(data pull) (both logic extractor 31 and data refiner 32 are part of the data subset builder 18). the data refiner 32 then processes the production data 26 to generate a data essence 33(data distilllate). it does so by using the extraction process specified by the control variables 30 to extract those portions that are relevant to testing the logic specified by the logic extractor 31. The processing module extracts a portion of the data from the input data set using a specified extraction process, and the resulting extracted data is referred to as a "data essence".

A data essence 33 is selected from production data 26 based on the subset construction rules. These subset construction rules may come from multiple sources. In one example, a user specifies a subset construction rule. In another example, subset construction rules are formulated based on feedback from executing applications. In another example, data essence 33 includes a data record that may cause some or all of the code in application 14 to be executed.

As one example, production data 26 may include data records that each include a plurality of fields, some fields having particular allowable values, some of which occur more readily than others. Different allowed values exercise different portions of the code. Thus, in order to fully test the code, all combinations of all values must occur. In some embodiments, the designed test data is derived by making those values of small probability more likely to occur, so that not so many records need to have all combinations of allowed values obtained.

In this case, the designed test data can be regarded as data in which the probability distribution of the recorded values has become more uniform. In other words, if the probability that a particular allowable value appears in production data 26 is relatively low, then the probability that the value appears in the designed test data is relatively high. Conversely, if the probability that a particular allowable value appears in production data 26 is relatively high, then the probability that the value appears in the designed test data is relatively low. The net effect of this is to have designed test data with a reduced probability of the most probable event and an increased probability of the least probable event. This reduces the spread of probability values. The extreme state of this is when a uniform distribution is defined (i.e. the probability values are spread to 0). Thus, a reduction in the overall spread of probability values tends to push the distribution toward a uniform distribution. This tends to produce a more efficient data set for testing because the redundancy caused by more likely values is reduced, while also reducing the amount needed to ensure that the least likely value is obtained. The degree of efficiency corresponds to the test logic density of the designed test data.

In many cases, production data 26 will consist of multiple tables in a database. Records that have pointers in a first table pointing to or "referencing" a second table may be coupled to the tables.

Whenever a pointer points to some record, there are two possibilities: (1) the pointer points to a valid record, and (2) the pointer does not point to a valid record.

In a first possibility, each pointer in the first table points to a valid record in the second table. In this first possibility, the two tables are described as having a "referential integrity". Therefore, as used herein, the term "referential integrity" is used to describe one or more data sets in which one portion of the data set is valid for each reference to a value in another portion of the data set.

In the second possibility described above, at least one pointer in the first table does not point to a valid record in the second table. In this second possibility, both tables are described as lacking referential integrity.

For proper testing, it is preferable that if production data 26 has referential integrity, then the designed test data should be. Thus, the data refiner 32 should provide a data essence 33 that maintains the integrity of the application.

To determine whether such referential integrity is maintained, the data refiner 32 provides a data essence 33 to an integrity checker 34. If the integrity checker 34 determines that the data essence has referential integrity, the data essence 33 is provided as an output data subset 35 of the data subset constructor 18. Otherwise the data essence 33 is provided to the re-citation 36 for repair, which is then provided as the output data subset 35.

In some embodiments, re-citation 36 and data enhancer 20 perform the same function. For example, if a loss of referential integrity occurs because a pointer in one dataset does not point to a record in another dataset, re-referrer 36 may augment the second dataset with the appropriate record using the same method as used by data enhancer 20. The re-citation 36 may thus be considered an optional constituent of the data design module 16.

In certain embodiments, as shown, the data subset constructor 18 also includes a data verification unit 37 that is capable of viewing and/or summarizing the output data subset 35. However, in other embodiments, the data verification unit 37 is not present.

Embodiments having a data verification unit 37 include embodiments where the data verification unit 37 is a viewer and embodiments where the data verification unit 37 is a generalizer. The embodiment further included in the embodiment including the data verification unit 37 is a structure in which the data verification unit 37 in the embodiment is capable of being both viewed and generalized based on the purpose desired by the user.

As used herein, "generalizing" a subset of data may include, for example, obtaining metadata or aggregated data about the subset, and the result of the generalization is referred to as "side-writing (profile)". The aggregated data includes a number of features, such as the number of records, the range of values in the records, and statistical or probabilistic descriptions of values in the data (e.g., the n-th moment of a probability distribution, where n is a positive integer).

For example, sometimes when developing a new system, there is no production data available for refinement. In other cases, production data may be difficult to obtain. To accommodate these conditions, one approach is to activate the positive data maker 22 of the data design module 16.

Referring to fig. 13, the positive data maker 22 receives the logic specification 28, the control variables 30, and the core-relation information 38(key-relation information). The logic extractor 31 identifies the logic functions to be tested and provides them to a data generator 40. The data generator 42 then generates the appropriate test data using the extraction process specified by the control variables 30. Examples of how DATA may be generated are described in U.S. provisional application entitled "DATA GENERATION" (inventor Isman et al, application No. 61/917,727, application date 2013, 12 and 18) and U.S. patent entitled "DATA RECORDS SELECTION" (inventor Isman et al, publication No. 2014-0222752, publication No. 2014-7 and 2014-8).

Preferably, the resulting manufactured test data 39 has referential integrity to perform appropriate testing. Accordingly, manufacturing test data 39 is provided to integrity checker 34 to determine whether referential integrity has been established. If integrity checker 34 determines that the manufacturing data has referential integrity, then manufacturing test data 39 is provided as positive data maker output 41. If the manufacturing test data does not have referential integrity, then the manufacturing test data 39 is provided to the re-referrer 36 for repair and then provided as an output of the positive data maker 22.

In some embodiments, positive data producer 22 also includes a data verification unit 37 that is capable of viewing and summarizing the produced test data 39 in data driven test framework 10. In other embodiments, data verification unit 37 is not present.

In some cases, production data 26 is present, but not completely in the desired form. In such cases, it may be useful to enhance the production data by activating the data enhancer 20 of the data design module 16.

For example, data enhancer 20 may be used to add one or more fields to existing production data 26 based on provided rules and generate data to populate the fields.

Fig. 14 shows details of the data enhancer 20. The data enhancer 20 receives actual production data 26 (or any input data set to be enhanced), logic specifications 28, and control variables 30. The logic extractor 31 identifies the logic functions to be tested and provides them to the data refiner 32 and the data modifier 48. The data refiner 32 then processes the production data 26 to extract those portions that are relevant to the logic specified by the test logic extractor 31 using the extraction process specified by the control variables 30. Based on the information provided by logic extractor 31, data adjuster 48 adds the appropriate fields and enters the appropriate values in those fields, thereby generating enhancement data 49.

Preferably, the enhancement data 49 provided by the data aligner has referential integrity for proper testing. Thus, the enhancement data 49 provided by the data adjuster 48 is provided to the integrity checker 34 to determine whether referential integrity has been maintained. If the integrity checker 34 determines that the enhancement data 49 has referential integrity, the enhancement data 49 is provided as an enhancement data output 51 of the data enhancer 20. Otherwise, the enhancement data 49 is provided to the re-referrer 36 for repair and then provided as the enhancement data output 51 of the data enhancer 20.

In some embodiments, data enhancer 20 also includes a data verification unit 37 that can view and summarize enhanced data output 51 in data driven test box 10. In other embodiments, data enhancer 20 does not have a data verification unit.

In some cases, a person may wish to exercise code segments that will not be exercised by any data that is typically present in production data. To address this, the data design module includes a negative data producer 24, the function of which negative data producer 24 is to create such negative test conditions.

A second obstacle to efficient testing comes from the need to set up, control, and then offload the test environment.

Typically, testing involves running multiple tests in a test suite, and the same is done for one or more graphs and plans that interact with multiple external data sets. These data sets may come from files, tables, queues, multi-file, and web services. To accomplish the task of having the application 14 execute the test suite, the data driven test framework 10 provides a computing environment manager 44.

The computing environment manager 44 performs tasks of the running application 14 in a controlled manner in known environments using known inputs. This provides flexibility in specifying the particular application to be tested. The computing environment manager 44 maintains a repository folder containing aggregated data corresponding to input data to be processed by the application 14, data flags, output paths, and customizable logic regarding setup, uninstall, and reporting.

The computing environment manager 44 automatically creates the data set into a file or table. These data sets include the source of the data (i.e., the data to be operated on by the application 14) and the target (i.e., the place where the processing results produced by the application 14 are ultimately deposited). The environment manager 44 then automatically sets the source and target to the correct initial states; running the application 14 using a suitable test suite; storing the results in the target; and restoring the environment to its pre-set condition. In some cases, environment manager 44 backs up the previous environment and restores it after testing is complete. The automatic creation and uninstallation of environments facilitates iterative testing with minimal manual labor.

A computer system can be viewed as a set of nested layers that are continually abstract. Each layer creates a logical structure that can be used by layers at higher levels of abstraction. These include storing values of state and environment variables.

When an application executes, it may be considered to execute on these layers. A set of logical structures created by the lower layers can be considered an environment in which to execute an application. To properly test an application, it is preferable to maintain the same environment in approximately the same way, as properly testing a physical structure often depends on maintaining a constant physical environment.

Referring now to FIG. 15, in one embodiment, the computing environment manager 44 includes a context switch 46 that causes the switching of two contexts: one in the creation phase and the other in the uninstall phase.

The environment translator 46 receives the input specification 53 and the output specification 50. The input specification 53 identifies the source 52 of the input test data. The input may be a file, a multi-file, a queue, a web service, or any combination thereof. The output specification 50 identifies the target 54 that the test output is to be deposited on. The context switch 46 also receives an initialization signal 56 which contains information about the initial state of the inputs, outputs and any environment variables. Finally, the environmental switch 46 receives the test signal 58 to indicate that the test is to begin.

In some embodiments, during the creation phase, environment transformer 46 copies test data and/or reference data from the first data repository into source 52 (these data are stored in source 52 during the actual testing process). After the test procedure is completed, the unloading phase begins. During this offload phase, the context converter 46 deletes the test data from the target 54.

Upon receiving the test signal 58, the context switch 46 communicates with the context backup 60 to create a backup 62 of the context. This is followed by directing the input source switch 64 to the appropriate source 52 and directing the output source switch 66 to the appropriate destination 54.

Upon completion of these tasks, context switch 46 sends a signal to actuator 68 to cause application 14 to execute a test suite 79 comprising one or more tests 80. In some implementations, execution of the test suite includes automatically executing one or more scripts. Upon completion of execution, the executor 68 sends a signal to the context restorer 70, and the context restorer 70 then retrieves the backup 62 and restores the context to its original state.

In execution, the application 14 implements one or more rules. In some embodiments, the rules are specified by a specification that includes at least a conditional expression and an execution expression. When the conditional expression is evaluated as "true", the application 14 continues to evaluate the execution expression. Whether a conditional expression evaluates to "true" may depend on the values of one or more variables in the data. These variables may be input variables corresponding to input data. Or they may be derived variables that depend on one or more input variables. Thus, whether the application 14 executes a rule in a particular test workout ultimately depends on whether the selection of test data has a variable that enables the conditional expression corresponding to the rule to be evaluated as "true".

In some examples, the application 14 executes all triggered rules. In other embodiments, the application 14 executes fewer than all of the rules that are triggered. The rules are described in more detail at column 5, line 61 to column 6, line 11 of U.S. patent No. 8,069,129, filed on 10/4/2007, the contents of which are incorporated herein by reference.

Once the actuator 68 completes the test suite 79, the results analysis module 72 takes over and begins analyzing the test results. One of the functions of the result analysis module 72 is the process of creating these known correct result sets and causing the verification that the application 14 under test eventually reaches a correct answer.

In some cases, there is an old version of the application being tested, which is typically the version currently in use. For example, it can be considered the gold standard to establish the accuracy of the output. Thus, the old version of the application to be replaced by the application being tested will be referred to as the "gold standard version".

If the version of the application being tested does not yield results consistent with those obtained by the golden standard version when executing the same data using the same environment, then an inference can be made that the version of the application being tested outputs erroneous results.

One step that occurs in the process of testing the application 14 is to determine whether the application 14 has in fact processed the data correctly. In order to perform this step, there must be a way to establish a correspondence between the expected results of operating on a data set (as defined by the functional specification of the application 14) and the measurements of operating on the same data set (as acquired by the actuators 68). In other words, a reference 74 for a correct reply needs to be obtained. Once such benchmarks 74 are available, results analysis module 72 may verify results 78 by comparing them to benchmarks 74.

The method of obtaining the reference 74 depends in part on how much the application 14 deviates from any application it replaces. In general, the larger the deviation, the more difficult it is to generate the reference.

At an abstract level, knowing the dataset X and the environment E, the version n of the application f will produce the output Y ═ fn (X, E). The question is how to determine whether Y is correct.

There are generally three possibilities.

The first possibility is that there is a different version of the application, version m, which may operate on (X, E). If version m is deemed reliable, the accuracy of result Y can be established by asking if fn (X, E) is equal to fm (X, E).

The second possibility is that there is another version of the application, the versionmThis version is considered not fully reliable. In this case, it must be queried whether fn (Z, E) equals fm (Z, E), whereAnd fm (X, E) is considered reliable for Z, but for Z^CIs other than, wherein Z^CIs the complement of Z. To establish fn (Z)^CAnd E) accuracy, the correct result must usually be determined manually.

A third possibility is that no known reliable version of the application exists. This is only the second possible degenerative case, where Z. In this case, the process for determining the correct result is performed manually.

One method of obtaining the benchmark 74 is useful when the application under test 14 is to replace a current application having substantially the same functionality. This corresponds to the first possibility defined above. In this case, the benchmark 74 may be from a result generated by the gold standard version of the application.

In some cases, the application under test 14 is an enhancement to an existing application. This enhancement allows the application under test 14 to be expected and will actually produce different results. This situation (corresponding to the second possibility described above) may arise, for example, if the gold standard version has a vulnerability that causes an incorrect response and the tested application 14 will fix the vulnerability.

For these cases, the results analysis module 72 reports which fields have changed and/or whether the number of records in the output have changed. The results analysis module 72 reports any mismatches so that one can immediately recognize whether certain fields have changed accidentally when no change is needed. Human intervention may be required to determine the correct answer for those fields that are expected to change, and enter the correct answer into the benchmark 74.

In other cases, the application under test 14 is a completely new system. This corresponds to the third possibility outlined above. Thus, there is no existing output data that can be used as a basis for creating benchmarks 74.

In this case, the benchmarks 74 are constructed by entering the correct results (e.g., manually) for a subset of the production data 26, starting with the existing production data 26. This may be accomplished by looking at the underlying logic of the application 14 to be tested and identifying, based on that logic, certain fields in the source data that are likely to be most affected by the various logical paths through the application 14. These fields are the ones that should be chosen when selecting the data subset.

In some cases, some simple tests may be performed automatically without having to verify the benchmarks 74. For example, if an application 14 is known to produce an output record for each input record, the application 14 may be caused to operate on production data 26 of a known cardinality (cardinality), in which case the cardinality of the output data will provide some information about the functionality of the application 14. In particular, results analysis module 72 may automatically indicate the probability of flaws in the implementation of application 14 to the extent that there is a non-zero deviation between the corresponding cardinality of production data 26 and the corresponding cardinality resulting from the operation of production data 26 by application 14.

For example, in some cases, the application 14 will produce an output that includes several components of different cardinalities, where relationships exist between these different cardinalities. In one example, the application 14 operates on input in the source 52 and generates two separate tables in the target 54. The results analysis module 72 automatically detects such deviations and outputs information indicating flaws in the implemented application 14 to the extent that there is a relationship between the cardinalities of the two tables.

In another example, an input form in source 52 may have N records. If it is known that the output table in target 54 should also have N records, then checking the number of records in the output table is a good way to check how well the software is running. For example, if it is observed that there are N +1 records in the output when there are only N records in the input, this indicates that there may be an error.

In another example (i.e., a generalized example of the foregoing example), a known application changes the number of records in some determined manner. Thus, in general, if the output number of records for an input form of N records is f (N) for some known function f, then one way to identify an error in an application is whether the output form actually has f (N) records when the input form has N records.

After execution, it is useful to provide reports giving information indicative of the execution of the application 14, particularly information regarding the interaction of the application 14 with the test information provided to it. Examples of such information may include rules that have or have not been executed by the application 14, the number of times each rule in the application 14 was executed, or any other information that may explain the interaction between the application 14 and the test data.

Based on the report, the user may identify additional test data. The additional test data may be, for example, data that causes any unexecuted rules to be executed, data that causes a particular logic rule to be executed a particular number of times, or data that causes other desired execution results. The user may then formulate new subset construction rules to initiate selection of an updated subset of data records in accordance with these additional subset construction rules. The updated subset of data records may include: data records sufficient to enable some or all of the previously unexecuted rules to be executed; data records sufficient for some or all of the rules to be executed a specified number of times; or a data record sufficient to cause other desired execution results.

Results analysis module 72 may provide reports of the extent to which the test data was coded, among other types of information. The report includes the cumulative score (e.g., the percentage of the code line tested) and more detailed information (e.g., which line of code was not tested). This information enables the user to determine whether the test is being performed adequately based on the percentage of code tested and the importance of code ignored in the test.

FIG. 16 provides a comprehensive summary of the efficient testing process using the components described herein. The test procedure is generally divided into a data correlation step 82 and an application correlation step 84.

The data correlation step 82 includes running a side write on any existing production information. This is identified as step 86, identified in FIG. 16 by the text "side write Production Data".

The next data-dependent step is to obtain certain aggregated data about the production data from the side writes. This step is identified in FIG. 16 as step 88, which is identified by the text "Get Metadata". It should be understood that "metadata" refers to aggregated data. Examples of such aggregated data include, but are not limited to, key lists, field cardinalities, and ranges of values.

This metadata (or "aggregated data") is used to generate a reference full subset of data (referred to as a reference full subset), as identified in step 90 of fig. 16, which is identified by the text "Make reference full subset".

Some practices include enhancing referencing the full data subset by creating and including negative test data. Indicated as step 92 in fig. 16, is identified by the text "Create Negative-Data".

Other approaches include enhancing the referencing of the complete data subset by manufacturing the manufacturing data. Designated as step 94 in fig. 16, is identified by the text "Manufacture New Data (manufacturing New Data)".

The application correlation step 84 includes building an application or repairing or enhancing an application in some manner to adjust an existing application. The step of building an application is shown in FIG. 16 as step 96 and is identified by the word "build APP". The step of repairing or enhancing an application in some way to adjust an existing application is shown in fig. 16 as step 98 and is identified by the word "adjust APP". The abbreviation "APP" throughout fig. 16 refers to application 14.

The application correlation step 84 also includes the step of registering the application 14 into a repository, along with a dependency analysis, which embodies how the computing module of the application depends on the data set accessed or generated by the application. Shown as step 100 in fig. 16 and labeled with the text "Check-in APP, Dependency Analysis (Check-in APP, Dependency Analysis)".

The application is then caused to operate on the designed test data, as indicated in FIG. 16 by step 102, which is labeled "run APP on engineering data".

The results are examined to determine code coverage, as shown in step 104 of FIG. 16, which is labeled with the word "report code coverage".

Based on these coverage reports, the data driven test framework 10 provides adjustment recommendations that can be adjusted for test data to provide better code coverage. Shown in fig. 16 as step 106, is labeled with the text "proposed method for improving code coverage".

The result of step 106 optionally results in an adjustment of the data design process by creating additional data or varying in a manner that extracts a subset of the data from the existing data. This step is identified in FIG. 16 as step 108 and labeled "adjusting the data design".

In addition, the output data is compared to a benchmark 74 to assess the integrity of the output data, which is shown in FIG. 16 as step 110 and labeled "determine correct outcome of APP".

The application 14 is adjusted to eliminate the bias to the extent that the results differ, as shown in step 98 of fig. 16, which is marked by the word "adjust application". The determination of whether there is a deviation is performed in the step identified by reference numeral "112" in fig. 16 and labeled with the word "compare result with expected result".

In some embodiments, the data refiner 32 refines the production data 26 according to one or more subset construction rules. The subset construction rules are rules that cause the data refiner 32 to identify a subset of data records to be chosen in a larger set of data records. Thus, the resulting data essence 33 is smaller in number than the original data, but has a higher test logic density. This ultimately results in more efficient testing because higher code coverage can be achieved with lower data volume when the application 14 operates on the data essence 33.

The subset construction rules that the data refiner 32 relies on may originate internally in the data design module 16, elsewhere in the data driven test framework 10, or from external sources.

In one example, the logic extractor 31 provides subset construction rules, and the logic extractor 31 uses the logic specification 28 to present side-writes of data records and formulates the subset construction rules based on analysis of the resulting side-writes. These subset construction rules are then provided to the data refiner 32, which the data refiner 32 then uses to create the data essence 33.

In another example, the subset construction rules come from results analysis module 72, and results analysis module 72 relies on information containing the results of having executed application 14 on particular test data. The data subset constructor 18 then formulates subset construction rules based on the analysis of these results (e.g., based on reports from the results analysis module 72). These data are ultimately executed by a data refiner 32 to create a data essence 33.

In another example, rather than formulating the subset construction rules, the data subset constructor 18 receives the subset construction rules from an external source. In some cases, the data subset builder 18 receives the subset construction rules directly from a user who is actually seated in front of the testing computer 12 and who manually specifies the subset construction rules through a user interface. In other cases, the data subset builder 18 obtains the subset construction rules by having the test computer 12 read them from a non-transitory computer-readable storage medium (e.g., a hard disk) or having the test computer 12 receive them via a non-transitory computer-accessible transmission medium (e.g., a network, including a wide area network, such as the internet).

The subset construction rules are atomic or molecular, whether externally received or internally generated. An atomic subset construction rule cannot be further decomposed into multiple subset construction rules. A molecular subset construction rule consists of a combination of two or more atomic or molecular subset construction rules. Typically, boolean logic operators combine atomic subset construction rules to form molecular subset construction rules.

The subset construction rules may also be deterministic or random. An example of a subset construction rule that is determined is a rule that causes all records matching a particular criterion to be selected. An example of a random subset construction rule is a rule that randomly specifies, among all records that match a particular criterion, two of those records to be selected.

In some examples, the subset construction rule specifies one or more target data fields and specifies that a respective unique value or value classification for each target data field is included in data essence 33. To implement this example, the data refiner 32 identifies individual values of the target data fields in the data records and creates a data essence 33 having only those data records that satisfy the subset construction rules.

For example, a "status" data field having a unique value for each of the fifty states and a "gender" data field having two unique values may be identified as target data fields. In this case, data refiner 32 selects the data records corresponding to data serum 33 such that each of the fifty values for "status" and each of the two values for "gender" are included in at least one data record of data serum 33.

In some examples, the data subset constructor 18 implements subset construction rules that specify the type of relationship between data records, either in the same set of data records or between different sets of data records. In these examples, the data refiner 32 selects the data records based on relationships between the data records and other data records selected for the subset. For example, data refiner 32 may select a plurality of data records sharing a common value for the customer identifier data field to be included in data essence 33.

The data subset constructor 18 may also implement filter-dependent subset construction rules. In these cases, data refiner 32 includes within data essence 33 records having particular values in particular target fields. For example, the data refiner 32 may select records such that each value of "state" is represented at least once. Alternatively, the data refiner 32 may implement the allocation scheme by considering the value of the field "population" and selecting data records such that the number of records having the value "status" depends on the value of the "population" associated with that status.

In some examples, a user (such as a data analyst or application developer) provides the subset construction rules. For example, a user may identify target fields or specify relationships between data records and provide such a specification to the data subset constructor 18.

In other examples, the data subset constructor 18 side-writes the data records and performs analysis of the side-writes to identify or formulate the appropriate data subset construction rules. To perform a side-write, the data subset constructor 18 accesses the associated data records and analyzes certain characteristics thereof to generate a side-write of the data records. These features include one or more of the following: data records of a single data set, relationships between data fields in a data record set, and relationships among data fields between different data record sets.

A side-write of a data record set is a summary of the data in the data record set. The summary may be provided field by field. The side-write may include information characterizing the data in the set of data records. Examples of such information include cardinality of one or more of the plurality of data fields of the data record, classification of values of one or more of the plurality of data fields, relationships between data fields in the respective data records, and relationships between data records. The side-write of the data record set may also include information characterizing the "pseudo field". A dummy field is a manufactured data field that has been populated with values determined by the operation of certain values taken from one or more data fields in the associated data record.

Based on the generated side-writes of the data records, the data refiner 31 identifies characteristics of the data records that are relevant to the selection of a subset of the data records that achieve good code coverage for the application 14. For example, based on side-writing of data records, the data refiner 31 may identify one or more data fields or combinations of data fields that may be related to input variables and derived variables of an application. In some cases, the subset construction rules may also be formulated based on input received from a user or computer storage medium and/or based on results of executing the application 14 (e.g., based on input received from the results analysis module 72).

The data subset constructor 18 may specify subset construction rules based on different analysis methods. In some embodiments, the data subset constructor 18 specifies a subset construction rule based on an analysis of the data fields in the respective data records. In one example, this includes determining which data fields may be relevant to variables in the application 14. In another example, the data subset constructor 18 identifies the target data field based on the number of allowed values for the field. For example, a "gender" data field has only two permitted values and may be identified as a target data field. On the other hand, the "telephone number" data field cannot be identified as the target data field.

In other examples, the data subset constructor 18 identifies as the target data field a dummy field populated with data generated by the manipulation of data in one or more data fields. For example, data in the "revenue" data field may be divided into several categories (e.g., high, medium, or low), and a dummy field populated with the category of the "revenue" data field may be identified as the target data field.

In other examples, the data subset constructor 18 identifies the target data field based on a relationship between the target data field indicated by the side-write and one or more other data fields in the same record. For example, a side-write may indicate that the data fields "state" and "compressed code" are not independent. Based on this dependency, the data subset constructor 18 may consider only one of these data fields as a possible target data field.

The data subset constructor 18 may also specify one or more subset construction rules based on an analysis of relationships between different data records within and/or between different sets of data records as indicated by the side-writes. For example, a side-write may indicate that multiple data records may be linked via a common value of a data field. An example of a link value may be the value of a client ID data field.

Once the data subset constructor 18 has selected the subset of data records and once the data verification unit 37 has confirmed their validity, the data design module 16 provides the subset of data records to the computing environment manager 44, the computing environment manager 44 eventually being prepared so that the subset of data records is operating on the application 14 being tested. Data design module 16 provides data records or data indicative of such data records including data essence 33. For example, the data design module 16 may provide the computing environment manager 44 with an identifier of the data record including the data essence 33 or an address of the data record. The data design module 16 may also provide the file containing the selected subset of data records to the computing environment manager 44.

After execution, results analysis module 72 generates a coverage analysis report containing data indicative of the results of executing application 14 on data serum 33. In some implementations, the results analysis module 72 generates a coverage analysis report that includes information identifying executed or unexecuted portions of the source code of the compilation application 14, or information identifying how many times portions of the source code of the compilation application 14 were executed. In some implementations, the results analysis module 72 generates a coverage analysis report that includes information identifying the rules that the application 14 executed or did not execute and information identifying the number of times the application 14 executed each rule. In other practices, results analysis module 72 generates a coverage analysis report that includes information identifying executed or unexecuted portions of the source code of compiled application 14 and the number of executions of selected portions of the source code of compiled application 14. In other approaches, results analysis module 72 generates a coverage analysis report that includes information identifying errors that occurred while attempting to execute a particular portion of the source code of compilation application 14. In other implementations, results analysis module 72 generates a coverage analysis report that includes information identifying errors that occur when application 14 attempts to execute a particular rule and identification of those rules that caused the errors when executed.

In some implementations, the results analysis module 72 generates a coverage analysis report that directly represents those rules that have or have not been executed. In other implementations, the results analysis module 72 generates a coverage analysis report that contains additional information about the execution of the application 14, such as the number of executions of each logic rule, the values of various variables applied during execution, or other information.

In other implementations, for each logical rule in the application that is not executed, the results analysis module 72 identifies one or more variables of the application 14 associated with the logical rule. In some implementations, the results analysis module 72 also identifies variables based on data included in the report (e.g., data indicative of data flow through the application 14) or based on pre-loaded information about the application. In some cases, results analysis module 72 also identifies values or ranges of values for each variable that should cause the logic rule to execute. Once identified, the data design module 16 uses the input data fields and the values or ranges of values corresponding to the variables in the selection of the later updated subset of data records to specify additional subset construction rules.

For example, if the identified variable is an input variable of an application that directly corresponds to one of the data fields in the data record, the data design module 16 identifies the corresponding data field and a value or range of values for that data field.

For example, if the logic rules in application 14 execute when the input variable is greater than a certain threshold, data design module 16 determines that any manufactured or refined data should include at least one data record for which the input variable has a value greater than the threshold. Based on this information, the data design module 16 specifies additional subset construction rules such that subsequent data records provided to the application 14 will include data sufficient to cause a logical rule to execute only when the input variable for the logical rule exceeds a threshold.

In another example, the identified variable does not directly correspond to one of the data fields of the data record. Such variables are referred to as "derived variables". In the case of derived variables, the data design module 16 analyzes the data lineage (data-linkage) to track the derivation of the derived variables through the logic of the application 14 (derivitization). The data lineage analysis can identify a particular input variable or variables from which the identified variable is derived. The data design module 16 then identifies the corresponding data field or fields and the value or range of values for that data field.

For example, if logic execution in application 14 is applied when the value of the derived variable is equal to a particular value, data design module 16 executes instructions for data lineage analysis to determine that the derived value is derived from a logical combination of three input variables. By following a logical derivation of the derived variables, the data design module 16 determines the required values of the three input variables to achieve the particular derived variable.

The determination of the value used to produce the desired value of the derived variable is provided to data subset constructor 18, which specifies additional subset construction rules such that data essence 33 includes data sufficient to bring the derived variable to the desired value and thus trigger execution of the associated logic rule.

In some examples, the results of the coverage analysis are also provided to the user. In response, the user may provide additional subset construction rules to the data subset constructor 18 or may adjust previously provided subset construction rules.

Some logic rules are rarely triggered and even a complete set of data records may not (just by chance) include enough data for the application 14 to execute code that implements the logic rules. To identify such deficiencies in the complete data set, the application 14 may be executed one or more times using all of the data records as input. Regardless of the subset of data records selected as input, the resulting report identifies rules that are not overridden. To address this deficiency, the data driven test framework 10 uses the positive data maker 22 and/or the negative data maker 24 to make the required data.

In some embodiments, the data design module 16 performs data subset construction (data-subsetting) by filtering. The filtering may be positive or negative. In positive filtering, one starts with an empty set and adds only those data records that meet certain conditions. In negative filtering, one starts with the full dataset and prunes it by deleting data records that satisfy some condition.

In other embodiments, the data design module 16 performs data subset construction by identifying target data fields, determining possible values for each such field, and selecting data records such that each allowed value occurs at least once, or a specified number of times, for each target data field.

In other embodiments, the data design module 16 performs data subset construction by data classification. This is similar to the method of identifying the target data field, but replacing the actual target value with a range of values. Thus, if the target data field indicates cholesterol content for risk assessment, one can define containers (bins) representing low, medium, and high income using ranges. In this case, the data records may be selected such that each container or category will have some predetermined number of records.

In other embodiments, the data design module 16 performs data subset construction by a combination of dependent values. This can be understood by considering two target data fields: a first field with two allowed values (e.g., gender) and a second field with twelve allowed values (e.g., birth month). If one only wants to ensure that each possible value is presented once each, then only twelve records may be used to satisfy the requirement. However, it is also contemplated that one may wish to have all possible combinations of these two fields. In this case, at least 24 records are selected.

Additional details of the above-described method, as well as other methods that may be implemented by the data subset constructor 14, may be found in a patent publication entitled "data record selection," which has been incorporated by reference.

The data design module 16 performs calculations using the positive data producer 22, the negative data producer 24, and the data enhancer 20 in accordance with the principles set forth in the application "data generation" (which application has been incorporated by reference).

The data design module 16 generates a particular type of data that the user may specify. Exemplary data types include character strings, decimal integers, dates and times. The data design module 16 imposes limitations on the manufactured data, such as the range of allowable values for decimal or integer data used for manufacturing, the average string length used for manufacturing string data, the set of values or features that may be used in the manufacturing data, and other characteristics. The data design module 16 may manufacture the data by adjusting values in one or more fields of an existing source record, creating and populating new fields in a record to enhance a source record, or creating an entirely new record. In some instances, the user specifies the configurable options through a user interface.

The data design module 16 uses the positive data producer 22 to produce data for processing by the application 14. It may also use the data enhancer 20 to adjust or enhance existing data, such as production data 26. For example, data enhancer 20 may adjust the values of one or more fields taken from production data 26, or may create and populate one or more new fields in production data 26 and add them to an existing data record. The data design module 16 may also make entirely new data records using the positive data maker 22. In some embodiments, the format of these new records is based on production data 26, but in other embodiments, the format will be specified by an external agent (e.g., a user) using the same methods discussed above with respect to specifying the subset construction rules.

The data design module 16 manufactures the data to be stored in the target. In some examples, the data design module 16 manufactures the data based on the production data 26. In other examples, data design module 16 manufactures the data from scratch. As used herein, "beginning with" manufacturing refers to manufacturing according to specified characteristics, but not based on existing data.

The production data may be a file, database, parameter set, or other data source. Production data 26 may include one or more records, each having one or more data fields. For example, production data 26 may be a customer database that stores customer records for customers of a retail store. Each record of such a database represents an individual customer. Each record may have multiple fields. Production data 26 may have a record format that specifies the format of each record, such as the number of fields, the type of data in each field, and the characteristics of the data in each field (e.g., an allowed range of values, a maximum standard value, or a list of allowed characteristics). In some examples, data design module 16 generates data from scratch. In such cases, no data source is provided.

The data design module creates data based on the configuration data, which may be stored in a database, in a file, or in other data structures. The configuration data may specify a data generation method to be used, a content generation mode, a data type of the data to be manufactured, a content standard corresponding to the data to be manufactured, and other configuration information about the data to be manufactured.

In some cases, a user specifies some or all of the configuration data that data design module 16 uses for manufacturing data through a user interface available on test computer 12. In other examples, the data design module 16 determines some or all of the configuration data. In these cases, the data design module makes the determination based on an analysis of the production data or information about the desired attributes of the target.

In some examples, the data design module 16 may manufacture data for a target using the data enhancer 20 by adjusting values of one or more fields of existing source records in the production data 26 according to the configuration data and storing the adjusted records in the target. In other examples, data design module 16 adjusts all values of a given field using data enhancer 20. For example, a given field of each record may be assigned a value such that the distribution of values for a particular field of all records matches the target distribution specified by the configuration data. The user or configuration data specifies (or provides information for specifying) the target distribution.

In some cases, data design module 16 adjusts less than all of the values of a given field. In these cases, the data design module 16 adjusts only values that do not meet the particular criteria specified by the configuration data. An example of this is where the data design module 16 adjusts any value of a given field that falls outside of the field's particular allowable range of values.

In some examples, data design module 16 may manufacture the data by augmenting existing source records of production data 26 with one or more new fields according to the configuration data using data enhancer 20 and storing these augmented records in the target. The data providing instructions are configured to determine the number of new fields, the data type and the value of the new fields, and other characteristics of the new fields providing instructions.

In other examples, the data design module 16 uses the information provided by the configuration data to fabricate the data. This information specifies the value of a new field that is to be manufactured based on the data of the existing field of the production data. Alternatively, the information specifies the value of a new field to be manufactured according to certain characteristics not based on any already active data but specified by the configuration data.

In some examples, the data design module 16 may manufacture the data by using the data enhancer 20 to enhance existing source records of the production data 26 with one or more new records according to the configuration data and storing the enhanced records (i.e., the existing source records and the new records) in the target. In some embodiments, the new record has the same record format as the source record.

In other examples, the configuration data provides instructions for determining any combination of one or more of: the number of new records, the value of the fields of the new records, and other characteristics of the new records. In these examples, the configuration data specifies the values of one or more fields in the new record that are to be manufactured from scratch.

In some other examples, the configuration data specifies a side write and requires that the values of one or more fields in the new record to be manufactured satisfy the side write. In one such example, a side write specifies that the values of a particular field in all records collectively satisfy a specified property. An example of a characteristic is that the value has a particular mean value or a particular distribution. For example, in a customer database source, the configuration data may require that records be manufactured such that the value of the "age" field in all records satisfies a poisson distribution (poisson distribution) with a particular mean.

In some examples, the configuration data requires that the data design module 16 apply more than one data generation method. For one such example, the data design module 16 applies any combination of the following methods: adjusting values of one or more fields, enhancing a source record with one or more new fields, and enhancing a source record with one or more new records.

In some examples, the target stores only manufactured records. In other examples, the user specifies the source, and the data design module 16 manufactures the record based on the characteristics. Examples of suitable characteristics are the record format of the source or a side-write of one or more fields of the source.

In other examples, no source is specified. In such an example, the data design module 16 automatically manufactures the record from the beginning according to the configuration data.

In some examples, the record format of the source maps (map) to the target. In one such example, the configuration data indicates that the record format of the source is to be adopted by the target. In another such example, the configuration data requires that the source's record format be applied to the target and that a new record be made from scratch by the data design module 16 according to the source's record format. In other such examples, data design module 19 relies on multiple sources, and the record format of each source is mapped partially or fully to the target. In at least one such example, the format of the field of interest of each source is mapped to the target.

In some examples, the data design module 16 maps the record format of the source to the target and adjusts it. In these examples, the configuration data causes the data design module 16 to change the name of the field and the configuration data causes the field to be removed from the source.

The data design module 16 provides a user interface on the test computer 12 having a source window to enable a user to identify a data source. The source window includes a source type menu that allows the user to specify the source type (e.g., file or database) and an identifier of the source (e.g., path to source or path to configuration file of database source). In some examples, when the source is a database, the user specifies a query (e.g., an SQL query) that is used to retrieve the source data from the database. The source window provides an option for the user to indicate whether the data design module 16 will make a new record and, if so, how much. The source window enables the user to view and specify other information about the source. For example, a user may view the record format of the source, specify a file that defines the record format of the source, view the source data, or view a side-write of the source data.

In some examples, a source window of the user interface allows the user to have the data design module 16 manufacture the data without specifying a source. In particular, the source window enables a user to select manufacturing data as a source type in a source type menu. Selecting the manufacturing data as the source type causes a data generation window to be displayed in the user interface. The data generation window enables a user to specify the method used to manufacture the data and to specify the number of new records to be manufactured.

The user interface also provides a target window that enables the user to identify the target. An object type menu in the object window enables the user to specify the type of object. Examples of targets include files or databases. The target window also enables the user to specify an identifier of the target (e.g., a path to the target file or a path to a configuration file for the target database). The target window provides a run button that provides the user with various configurable options for accessing the data used to generate the data once the source and target are identified.

The data design module 16 provides a variety of methods to manufacture the data. They include field adjustment, field creation, record creation, use of existing sources, and use of parent datasets. To access the available methods, the user generates a window depending on the data of the user interface.

In the field adjustment method, the data design module 16 adjusts the values of one or more fields of the source record. In some cases, the data design module adjusts all values of a given field. In some examples, data design module 16 adjusts the values of the fields such that the distribution of values for a given field in all records matches a target distribution. In another example, the data design module 16 adjusts less than all of the values of a given field. In these examples, data design module 16 adjusts only values that do not meet specified criteria. For example, any value that falls outside of a particular allowable range of values for a particular field may be adjusted.

In the field creation method, the data design module 16 creates one or more new fields for an existing record. In some examples, data design module 16 manufactures values for new fields based on data for existing fields in the source data. In other examples, data design module 16 manufactures values for new fields from scratch.

In the record creation method, the data design module 16 creates a new record. The user specifies at least one of the number of new records and their format. For example, if the target is filled with both an existing source record and a newly manufactured record, the record format of the new record is the same as the record format of the source record. If the target is populated with only newly manufactured records, the user specifies the record format to be applied to the manufacturing data. The record format includes the number of fields, the data type of each field, the data characteristics of each field (e.g., maximum, minimum, set of allowed characteristics, and other characteristics), and other characteristics of the record format.

In the existing data set approach, the data design module 16 makes a specified number of new records for each key value in an existing source record. The key value is the value in the field of interest in the existing source record.

In one example, the secondary source contains data to be used to populate specific fields of the target record. However, the auxiliary source does not have a recording format that matches the recording format of the source or target. In this case, the data design module 16 maps one or more fields of interest from the auxiliary source to the target record. In the parent dataset approach, the source is the parent dataset in the hierarchy. In this case, the layout involves the module making a child data set related to a parent data set. In one example of the parent dataset approach, the parent dataset (which acts as the source) is a client record set; a subdata set (which serves as a target) is a set of one or more transaction records for each customer. The key field links a record in the child data set to a corresponding record in the parent set. For example, the "customer ID" field may be a key field linking a customer record and a transaction record. In some cases, the data design module 16 receives a specification of how many sub-records are to be manufactured. In other cases, the data design module 16 receives a specification of the percentage of parent records that are not used to make child records. In other cases, the data design module 16 receives a specification of the data format of the sub-record.

In some examples, the data design module 16 manufactures the data according to a format specification. The format specification specifies the format of the data to be manufactured. In one example, the format specification specifies a data type of the data to be manufactured.

In other examples, the data design module 16 manufactures the data according to content standards. Content standards limit the characteristics of the data to be manufactured. Examples of content criteria include an allowed range of values, a maximum allowed value, and a list of allowed features.

In some cases, the record format of the target record specifies a format specification and a content standard. In other examples, the user interface provides a field window that enables a user to specify characteristics of the field, such as a format specification or a content standard for the field.

The user interface also includes a record format window that enables a user to edit the target record format. This will include data characteristics of one or more fields of the edit target. The record format window displays a list of fields of the target record format. The field list also indicates the data type of each field. In some examples, the fields of the target record format are also represented in the source record format. These fields, which are presented in both the target record format and the source record format, are optionally labeled in a field list. In some examples, the unlabeled fields are presented only in the target record format. In other examples, fields that appear in the source record format but not in the target record format are not in the field list.

The record format window enables a user to select one or more fields of a target record format for communicating data generation characteristics to the data design module 16. To assist the user in keeping track of the selected fields, the user interface includes a selection list of the selected fields in the target record format. The fields listed in the selection list are those fields of the target record format that the user wants to specify the data generation characteristics.

In some examples, the selection list is a subset of a field list of all fields of the target record format. This may occur if the user only wants to specify data generation characteristics for some fields of the target record format.

The user interface enables a user to edit the record format of each of the selected fields displayed in the selection list. For example, for each selected field, the user may perform any combination of specifying a data type for the field, assigning a content generation mode to the field, and specifying data characteristics for the field. The user interface displays one or more of a data type window, a content generation window, and a data properties window for each selected field in turn. These windows enable the user to specify various characteristics for each selected field.

The data driven test framework 10 described above may be implemented, for example, using a programmable computing system executing suitable software instructions, or may be implemented in suitable hardware such as a Field Programmable Gate Array (FPGA) or some hybrid form. For example, in a programmed method, software may comprise processes in one or more computer programs executing on one or more programmed or programmable computing systems (which may be of various architectures such as distributed, client/server, or grid) each comprising at least one processor, at least one data storage system (including volatile and/or non-volatile memory and/or storage elements), at least one user interface (for receiving input using at least one input device or port, and for providing output using at least one output device or port). The software may include, for example, one or more modules of a larger program that provides services related to the design, configuration, and execution of dataflow graphs. The modules of a program (e.g., elements of a dataflow graph) may be implemented as data structures or other organized forms of data that conform to a data model stored in a data repository.

The software may be stored in a non-transitory form, such as embodied in a volatile or non-volatile storage medium or any other non-transitory medium, using physical characteristics (e.g., surface pits and lands, magnetic domains, or electrical charges, etc.) of the medium for a period of time (e.g., the time between refresh cycles of a dynamic memory device (e.g., dynamic RAM)). In preparation for loading the instructions, the software may be provided on a tangible, non-transitory medium such as a CD-ROM or other computer-readable medium (e.g., readable by a general or special purpose computing system or device), or may be delivered over a communication medium of a network (e.g., encoded as a propagated signal) to a computing system where it is executed. Some or all of the processing may be performed on a special purpose computer or using special purpose hardware, such as a coprocessor or a Field Programmable Gate Array (FPGA) or a special Application Specific Integrated Circuit (ASIC). The processing may be implemented in a distributed manner, with different portions of the computation specified by the software being performed by different computing elements. Each such computer program is preferably stored on or downloaded to a computer readable storage medium (e.g., solid state memory or media, or magnetic or optical media) of a storage device accessible by a general or special purpose programmable computer, for configuring and operating the computer when the storage device medium is read by the computer to perform the processes described herein. The system of the invention may also be considered to be implemented as a tangible, non-transitory medium configured with a computer program, where the medium so configured causes a computer to operate in a specific and predefined manner to perform one or more of the process steps described herein.

Various embodiments of the present invention have been described. It is to be understood, however, that the foregoing description is intended to illustrate and not to limit the scope of the invention, which is defined by the scope of the appended claims. Accordingly, other embodiments are within the scope of the following claims. For example, various modifications may be made without departing from the scope of the invention. Additionally, some of the steps described above may be order independent, and thus may be performed in a different order than that described.

Having described the invention and its preferred embodiments, what is claimed as new and claimed as defined by the appended claims.

The claims (modification according to treaty clause 19)

1. An apparatus for testing an application, the apparatus comprising:

a data handler including a memory and a processor operatively coupled to the memory, the data handler having been configured to implement a data driven test framework including a data design module, a computing environment manager, and a results analysis module;

wherein the data design module is configured to create designed test data based at least in part on an application under test;

wherein the computing environment manager is configured to control a computing environment in which the application operates on the designed test data; and

wherein the results analysis module is configured to compare the designed test data operated on by the application to expected outputs.

2. The apparatus of claim 1, wherein the first and second electrodes are disposed on opposite sides of the housing,

wherein the data design module is configured to extract a subset of the production data;

wherein the subset is selected to achieve a specified code coverage, an

Wherein the designed test data comprises the subset of the production data.

3. The apparatus of claim 1, wherein the data design module comprises a data refiner for generating refined data from production data.

4. The apparatus of claim 34, wherein the additional data is selected to achieve a specified code coverage.

5. The apparatus of claim 1, wherein the data design module comprises a data enhancer to receive refined data from the data refiner and to enhance the refined data.

6. The apparatus of claim 1, wherein the first and second electrodes are disposed on opposite sides of the housing,

wherein the data design module is configured to generate data based at least in part on the application under test;

wherein the generated data is selected to achieve a specified code coverage;

wherein the designed test data includes the generated data.

7. The apparatus of claim 1, wherein the data design module further comprises a positive data producer for generating positive data.

8. The apparatus of claim 1, wherein the data design module is configured to generate data based at least in part on an application under test, wherein the data is not present in production data.

9. The apparatus of claim 1, wherein the data design module further comprises a negative data producer for generating negative data.

10. The apparatus of claim 1, wherein the data design module comprises means for generating designed test data.

11. The apparatus of claim 1, wherein the data design module includes an integrity checker to determine referential integrity of the designed test data.

12. The apparatus of claim 1, wherein the data design module is further configured to detect an error in referential integrity.

13. The apparatus of claim 1, wherein the data design module includes a re-citation device to correct for lack of referential integrity in the data before outputting the data as designed test data.

14. The apparatus of claim 1, wherein the data design module is further configured to correct for a lack of referential integrity in the data.

15. The apparatus of claim 1, wherein the data design module comprises a verification unit to receive the designed test data and enable a user to at least one of: and checking the designed test data and performing side writing on the designed test data.

16. The apparatus of claim 1, wherein the data testing module comprises a data verification unit to receive the designed test data and enable a user to view the designed test data.

17. The apparatus of claim 1, wherein the data design module includes a side writer to receive the designed test data and enable a user to side write the designed test data.

18. The apparatus of claim 1, wherein the data design module is further configured to enable a user to side-write the designed test data.

19. The apparatus of claim 1, wherein the data design module is further configured to enable a user to view the designed test data.

20. The apparatus of claim 1, wherein the data design module comprises a plurality of devices for generating designed test data, wherein a particular device for generating designed test data is generated based at least in part on information related to the application under test.

21. The apparatus of claim 1, wherein the data design module comprises a data enhancer, a data refiner, a negative data producer, and a positive data producer, each of which is configured to provide data forming a basis for the designed test data.

22. The apparatus of claim 1, wherein the data design module comprises a logic extractor configured to identify logic functions to be tested in the application under test and provide those logic functions to a data refiner.

23. The apparatus of claim 1, wherein the data design module is further configured to identify logic functions to be tested in the application under test and provide the logic functions to be used as a basis for obtaining the production data subset.

24. The apparatus of claim 1, wherein said computing environment manager comprises means for automatically creating and uninstalling a computing environment in which an application under test to be tested is tested.

25. The apparatus of claim 1, wherein the computing environment manager comprises a context switch;

wherein the context converter is configured to identify a source of the designed test data; and

wherein the environment translator is further configured to identify a target in which data obtained by the application under test processing the designed test data is placed.

26. The apparatus of claim 1, wherein the context converter is further configured to copy the designed test data from the first repository to the source.

27. The apparatus of claim 26, wherein the context converter is further configured to copy the designed test data from the target to a second repository.

28. The apparatus of claim 1, wherein the computing environment manager comprises an environmental backup machine and a recovery machine;

wherein the environment backup is configured to backup a first environment prior to converting the first environment to a second environment;

wherein the restore configuration is configured to replace the second environment with the first environment; and

wherein the second environment is an environment in which testing of an application to be tested is to be performed.

29. The apparatus of claim 1, wherein the computing environment manager comprises an executor, wherein the executor is configured to cause an application under test to execute.

30. The apparatus of claim 29, wherein the executor is configured to automatically execute a script when causing the application to execute.

31. The apparatus of claim 1, wherein the computing environment manager comprises a context converter, a context backup, a restore, and an executor;

wherein the context converter is configured to identify a source of the designed test data;

wherein the environment translator is further configured to identify a target in which data obtained by the application under test processing the designed test data is placed;

wherein the restore configuration is configured to replace the second environment with the first environment;

wherein the second environment is an environment in which testing of an application to be tested is to be performed; and

wherein the executor is configured to start execution of the application under test.

32. A method of processing data in a computing system, the method comprising: a test application, wherein the test application includes receiving information indicative of an application under test through one of the input device and a port of the data processing system, and processing the received information;

wherein processing the received information comprises: creating designed test data based at least in part on the information; controlling a computing environment in which the application operates the designed test data; and comparing the designed test data operated on by the application with expected outputs,

the method also includes outputting a result indicative of the comparison.

33. Software stored in a non-transitory form on a computer-readable medium for managing testing of an application, the software including instructions for causing a computing system to perform process steps comprising:

creating designed test data based at least in part on the application under test;

controlling a computing environment in which the application operates the designed test data;

comparing the designed test data operated on by the application to expected outputs; and

outputting an analysis of the comparison.

34. The apparatus of claim 1, wherein the data design module is configured to extract a subset of existing data;

wherein the data design module is further configured to enhance the subset, thereby producing enhanced data; and

wherein the designed test data includes the enhanced data.

35. The apparatus of claim 34, wherein the enhanced data comprises one or more fields added to one or more records of the subset.

36. The apparatus of claim 35, wherein the data design module is further configured to generate data to populate the added one or more fields based on one or more provided rules.

37. The apparatus of claim 1, wherein the data design module is configured to create designed test data by refining existing data, wherein the designed test data is more logically intensive than the existing data.

Claims