Online Data Explorer
You can explore and visualize the latest dataset that has been extracted from Coq online. The visualization should give you some intuition about what kind of data is available and how it is encoded. If this data is interesting to you, you can find download and usage instructions below.
You can find the main entry-point for the visualization here. From there, you can explore the entire dataset. Examples of objects include:
- Module hierarchies showing dependencies between compilation units
- Global contexts of a compilation units
- Individual definitions within a global context
- Tactical proofs of theorems
- Individual Proof transformations of tactical proofs
Downloads
Datasets are published on Zenodo:
Accessing the data
The data is encoded using the Cap’n Proto serialization protocol, allowing for fast random access to the graph and metadata from many programming languages. For Python, a library called PyTactician is provided that allows for easy and efficient access to the data. It includes software to visualize the dataset, a sanity checker for the dataset, an example prediction server that interfaces with Coq, an Oracle prediction server and an example proof exploration client. This library is a good starting point to explore the dataset. If you are looking to use another language to access the data, your starting point would be the graph_api.capnp schema file.
The data in the archive is represented as a SquashFS image dataset.squ
.
In order to access the data, you can either mount it or decompress it. Mounting
is preferred on machines with limited memory. To mount, you need to install
squashfuse through your favorite package manager.
Note for MacOS users: Squashfuse is not available on MacOS. You can instead
install squashfs from Brew and decompress the dataset using the
unsquashfs
instructions below.
Once squashfuse is installed, if you are using PyTactician, you can load the
dataset by pointing directly to this image. This will automatically mount the
image in directory dataset/
. For example:
pytact-check dataset.squ
pytact-visualize dataset.squ
If you prefer to mount manually, or if you are not using the PyTactician library, you can mount the image using
squashfuse dataset.squ dataset/
pytact-check dataset/
pytact-visualize dataset/
umount dataset/ # Unmount the dataset, optional
Finally, you can also unpack the image without mounting it (this is the only option for MacOS). On systems with limited memory or slow hard-disks this will lead to some performance degradations while accessing the dataset. If you wish to unpack, you can run:
unsquashfs -dest dataset/ dataset.squ
Introspecting the dataset with PyTactician
You can use PyTactician as a library to write scripts that extract information from the dataset. To get started, you can take a look at some simple example scripts. Documentation of the API is also available.
Raw data inspection
For each Coq compilation unit X
, the dataset includes the original X.v
source file.
Alongside that file is a X.bin
file with Cap’n Proto structure containing the
data extracted during the compilation of X
.
If you wish to inspect the raw data in the dataset manually, you can use capnp convert to decode an individual file in the dataset to JSON as follows:
cat dataset/coq-tactician-stdlib.8.11.dev/theories/Init/Logic.bin | \
capnp convert binary:json meta/graph_api.capnp Dataset
You can process the resulting JSON further using, for example, the jq
command.