Building and Testing Datasets

This section describes how to build and test datasets using rules_contest.

What is a dataset?

A dataset is a zip archive file containing data files used to evaluate solution programs. Each data file in a dataset is named as <basename>.<extension> where a basename does not contain a period.

A test case consists of data files having the same basename in a dataset.

For example, a dataset consisting of four files, 00_sample1.in, 00_sample1.out, 00_sample2.in and 10_random.in contains three test cases named 00_sample1, 00_sample2 and 10_random.

A dataset is often built from other datasets in a pipeline of rules. The following diagram illustrates an example pipeline that builds a final dataset from static input data files, a dataset generator program and a reference solution program.

Dataset Pipeline

Building datasets

Several rules are provided to build datasets.

dataset_merge rule builds a dataset from zero or more datasets and zero or more data files. This rule can be used to build a dataset from static data files, as well as to merge multiple datasets.

dataset_merge

dataset_generate rule builds a dataset by running a program that writes data files to the directory specified by the OUTPUT_DIR environment variable. This rule is typically used to generate random input data files.

dataset_generate

dataset_derive rule extends a dataset by running a program. A program for the dataset_derive rule is run for each test case in the input dataset. By default, a data file with an input file extension (.in) is opened and connected to the standard input of the program, and a data file with an output file extension (.ans) is opened and connected to the standard output. The output dataset is built by combining the data files from the input dataset and the generated output files. This rule is typically used to generate answer data files by running a reference solution program over input data files.

dataset_derive

Testing datasets

It is important to ensure that built datasets have correct formats and meet the problem constraints. Currently one rule is provided to test datasets.

dataset_test rule tests a dataset by running a program. A program for the dataset_test rule is run for each test case in the dataset. A data file with an input file extension (default: .in) is opened and connected to the standard input of the program. A test passes if the program exits normally (exit code 0) for all test cases.

dataset_test