# Approach
This approach requires you to download the Spider Dataset (https://yale-lily.github.io/spider). From this dataset you need `tables.json'. Then select the queries you'd like to test from the dev dataset of Spider and modify the gold.txt accordingly. This file is very sensitive to spacing so make sure to leave an emtpy line between each query and a tab between the query and the target database. Next, you look at which database is required per query and retrieve these from the test suite of Spider (https://github.com/taoyds/test-suite-sql-eval) and create a database folder where you put these databases in. Now, make a account for Cosette and get an API key (https://cosette.cs.washington.edu/) and put it in the COSETTE_API_KEY.txt. 

Now assuming you defined three gold queries you'd want to test against three columns of user input, you'd do the following:
- define input csv file in implementation
- define the database folder
- define the cosette api key
- activate the tables.json (from spider dataset)
- define the gold queries you want to test
- define the databases from which these gold queries are from

Then you'd want to install the following packages through requirements.txt:
```
sqlparse
git+https://github.com/ReinierKoops/test-suite-sql-eval/tree/installable_pckg#egg=testsuitesqleval
nltk
sql_metadata
numpy
pandas
requests
urllib3
```

This should be it. You will now be "approximately" semantically testing queries. The Spider test suite should be adaptable to any query of any database. Make sure if you use your own database to create an appropriate tables.json.

# Used references

```
@inproceedings{Yu&al.18c,
  title     = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
  author    = {Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir Radev}
  booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
  address   = "Brussels, Belgium",
  publisher = "Association for Computational Linguistics",
  year      = 2018
}
```

```
@InProceedings{ruiqi20,
  author =  {Ruiqi Zhong and Tao Yu and Dan Klein},
  title =   {Semantic Evaluation for Text-to-SQL with Distilled Test Suite},
  year =    {2020},
  booktitle =   {The 2020 Conference on Empirical Methods in Natural Language Processing},
  publisher = {Association for Computational Linguistics},
}
```

```
@inproceedings{10.1145/3035918.3058728,
author = {Chu, Shumo and Li, Daniel and Wang, Chenglong and Cheung, Alvin and Suciu, Dan},
title = {Demonstration of the Cosette Automated SQL Prover},
year = {2017},
isbn = {9781450341974},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi-org.tudelft.idm.oclc.org/10.1145/3035918.3058728},
doi = {10.1145/3035918.3058728},
abstract = {In this demonstration, we showcase COSETTE, the first automated prover for determining the equivalences of SQL queries. Despite theoretical limitations, COSETTE leverages recent advances in both automated constraint solving and interactive theorem proving to decide the equivalences of a wide range of real world queries, including complex rewrite rules from the database literature. COSETTE can also validate the inequality of queries by finding counter examples, i.e., database instances which, when executed on the two queries, will return different results. COSETTE can find counter examples of many real world inequivalent queries including a number of real-world optimizer bugs. We showcase three representative applications of COSETTE: proving a query rewrite rule from magic set rewrite, finding counter examples from the infamous optimizer bug, and an interactive visualization of automated grading results powered by COSETTE, where COSETTE is used to check the equivalence of students' answers to the standard solution. For the demo, the audience can experience through the three applications, and explore the COSETTE by interacting with the tool using an easy-to-use web interface.},
booktitle = {Proceedings of the 2017 ACM International Conference on Management of Data},
pages = {1591–1594},
numpages = {4},
keywords = {sql, correctness, query processing, education},
location = {Chicago, Illinois, USA},
series = {SIGMOD '17}
}
```