# Property-Based Testing in Haskell with QuickCheck

This dataset was collected as part of a research study "PBT in the Wild" – How Haskell projects use property-based testing with QuickCheck in real world. 

## Document Sheets

This section describes the data that can be found in each sheet.

| Sheet | What it contains |
|-------|------------------|
| **repository_overview** | One row per repo: stars, version, LOC, test mix |
| **tests_overview**      | One row per sampled property (217 rows) |
| **category_frequency**  | Pivot table: intent label → count |
| **connectives_frequency** | Pivot table: logical construct × repo |

An explanation of each metric can be found in the table below.

### repository_overview

| Field | Type | Description |
| ----- | ---- | ----------- |
| `repository_name` | `string` | The name of the repository |
| `link` | `string` | A GitHub link to the repository |
| `stars` | `numeric` | The number of GitHub stars at the time of analysis |
| `version` | `string` | The version of the repository which was analyzed |
| `total pbt` | `numeric` | The total number of propety-based testing |
| `other test` | `numeric` | The number of non-PBT tests in the repository (naively counted)  |
| `PBT Percentage` | `string` | Share of PBT in the whole test-suite|
| `LOC in Haskell` | `numeric` | Lines of Haskell counted by `cloc` . |
| `note` | `string` | extra note if there's a special case |

*`NULL`s are represented with the `-` character.

### test_overview

| Field | Type | Description |
| ----- | ---- | ----------- |
| `Repository` | `string` | The name of the repository |
| `Test Location (Relative Path …)` | `string` | The relative path of the sample test in its repository |
| `Test name` | `numeric` | The name of the sampled test |
| `Category` | `string` | The version of the repository which was analyzed |
| `custom_generator` | `bool` | if the PBT used/defined a custom generator |
| `custom_shrink` | `bool` | if the PBT used/defined a custom shrinker |


###  category_frequency

A two-column pivot from `tests_overview`:

* `Category`  – Intent label  
* `Test amount` – Count of rows carrying that label

Percentages in the paper = `Test amount / 217`.

###  connnectives_frequency

* Rows = logical construct (`forAll`, `==>`, `.&&.`/`.||.`, `not`).  
* Columns = repository names, plus `Total`.  
* Values = number of sampled properties that contain *at least one* occurrence.  
* `note` = misc corrections / NAs.


## PBT Labels

Each of the analyzed property-based tests had exact **one** category assigned. Below is a list of all the categories and their explanations. 

| Category | Explanation |
| -------- | ----------- |
| `Different Paths` | Combining operations in different orders to get to the same result |
| `Hard to Prove, Easy to Verify` | It's difficult to formally prove the code's correctness but easy to verify that the result it gives is correct |
| `Idempotence` | Doing the same operation more than once is the same as doing it once |
| `Invariant` | A property that doesn't change during execution |
| `Round Trip` | Applying an operation and its inverse must yield the original value. |
| `Test Oracle` | Test results against an alternate version of the code/algorithm |
| `Structural Induction` | Prove that a property holds for a smaller set of the input to prove it holds for the entire input |

## Errata (paper PDF vs. dataset)

The spreadsheet records the final, double-checked numbers.
After the camera-ready PDF was submitted we noticed 4 small typographical slips:

| Location in paper | Printed value | Correct value (as in Excel) |
|-------------------|---------------|-----------------------------|
| Abstract – custom generators share | 55.7 % | **55.8 %**|
| Table 2 – other tests(aeson) | 2163 | **1876** |
| Section 3.1 – Invariant slice | 48.5 % | **44.7 %** |
| Section 3.2 - Generator and Shrinking | 54 tests(25 %)  | **48 tests(≈22.12%)**|

The last fix also applied to all other places referring this data. None of these corrections affect the conclusions.