Datasets

Online Data Explorer

You can explore and visualize the latest dataset that has been extracted from Coq online. The visualization should give you some intuition about what kind of data is available and how it is encoded. If this data is interesting to you, you can find download and usage instructions below.

You can find the main entry-point for the visualization here. From there, you can explore the entire dataset. Examples of objects include:

Module hierarchies showing dependencies between compilation units
Global contexts of a compilation units
Individual definitions within a global context
Tactical proofs of theorems
Individual Proof transformations of tactical proofs

Downloads

Datasets are published on Zenodo:

Data for API v15 on Coq v8.11

Accessing the data

The data is encoded using the Cap’n Proto serialization protocol, allowing for fast random access to the graph and metadata from many programming languages. For Python, a library called PyTactician is provided that allows for easy and efficient access to the data. It includes software to visualize the dataset, a sanity checker for the dataset, an example prediction server that interfaces with Coq, an Oracle prediction server and an example proof exploration client. This library is a good starting point to explore the dataset. If you are looking to use another language to access the data, your starting point would be the graph_api.capnp schema file.

The data in the archive is represented as a SquashFS image dataset.squ. In order to access the data, you can either mount it or decompress it. Mounting is preferred on machines with limited memory. To mount, you need to install squashfuse through your favorite package manager.

Note for MacOS users: Squashfuse is not available on MacOS. You can instead install squashfs from Brew and decompress the dataset using the unsquashfs instructions below.

Once squashfuse is installed, if you are using PyTactician, you can load the dataset by pointing directly to this image. This will automatically mount the image in directory dataset/. For example:

pytact-check dataset.squ
pytact-visualize dataset.squ

If you prefer to mount manually, or if you are not using the PyTactician library, you can mount the image using

squashfuse dataset.squ dataset/
pytact-check dataset/
pytact-visualize dataset/
umount dataset/ # Unmount the dataset, optional

Finally, you can also unpack the image without mounting it (this is the only option for MacOS). On systems with limited memory or slow hard-disks this will lead to some performance degradations while accessing the dataset. If you wish to unpack, you can run:

unsquashfs -dest dataset/ dataset.squ

Introspecting the dataset with PyTactician

You can use PyTactician as a library to write scripts that extract information from the dataset. To get started, you can take a look at some simple example scripts. Documentation of the API is also available.

Raw data inspection

For each Coq compilation unit X, the dataset includes the original X.v source file. Alongside that file is a X.bin file with Cap’n Proto structure containing the data extracted during the compilation of X.

If you wish to inspect the raw data in the dataset manually, you can use capnp convert to decode an individual file in the dataset to JSON as follows:

cat dataset/coq-tactician-stdlib.8.11.dev/theories/Init/Logic.bin | \
    capnp convert binary:json meta/graph_api.capnp Dataset

You can process the resulting JSON further using, for example, the jq command.