Usage
Overview
ParSNIP is a generative model of astronomical transient light curves. It is designed to
work with light curves in sncosmo
format using the lcdata
package to handle large
datasets. See the lcdata
documentation for details on how to download or ingest
different datasets.
Training a model
ParSNIP provides a built-in script called parsnip_train
that can be used to train a
model on an lcdata
dataset. It takes as input the path that the model will be saved to
along with a list of paths to datasets. For example:
$ parsnip_train ./model.pt ./dataset_1.h5 ./dataset_2.h5
will train a model named model.pt
using the datasets dataset_1.h5
and
dataset_2.h5
.
Generating predictions
The parsnip_predict
script can be used to generate predictions given an lcdata
dataset and a pretrained ParSNIP model. To run it:
$ parsnip_predict ./predictions.h5 ./model.h5 ./dataset.h5
will generate predictions to the file named predictions.h5
using the dataset
dataset.h5
and the model model.h5
.
Loading a dataset in Python
ParSNIP is designed to work with lcdata
datasets. lcdata
datasets are guaranteed to
be in a specific format, but they may include instrument-specific quirks, light curves
that are not compatible with ParSNIP, or metadata in unusual formats (e.g. PLAsTiCC
types are random integers). ParSNIP includes tools to clean up datasets from a range of
different surveys and reject invalid light curves. Given an lcdata
dataset, this can
be done with:
>>> dataset = parsnip.parse_dataset(raw_dataset, kind='ps1')
Here kind
specifies the type of dataset, in this case one from PanSTARRS-1. Currently
supported options include:
ps1
ztf
plasticc
A convenience function is also included to read lcdata
datasets in HDF5 format and
parse them automatically:
>>> dataset = parsnip.load_dataset('/path/to/data.h5')
This function will attempt to determine the dataset kind from the filename. This can be
overridden with the kind
keyword as in the previous example.
Loading a model in Python
Once a model has been trained, ParSNIP has a vast Python API for manipulating it and using it to generate predictions and plots. To load a model in Python:
>>> import parsnip
>>> model = parsnip.load_model('/path/to/model.h5')
There are several built-in models included that can be loaded by specifying their name. Currently, these are:
plasticc
trained on the PLAsTiCC dataset.ps1
trained on the PS1 dataset from Villar et al. 2020.plasticc_photoz
trained on the PLAsTiCC dataset. Uses the photometric redshifts instead of the true redshifts.
To load one of these built-in models:
>>> model = parsnip.load_model('plasticc')
Assuming that you have a light curve in sncosmo
format, some examples of what can be
done with a model include:
Predict the latent representation of a light curve:
>>> model.predict(light_curve)
{
'object_id': 'PS0909006',
...
's1': 0.19424194,
's1_error': 0.44743112,
's2': -0.051611423,
's2_error': 1.0143535,
...
}
Plot the predicted light curve:
>>> parsnip.plot_light_curve(light_curve, model)
Plot the predicted spectrum at a given time:
>>> parsnip.plot_spectrum(light_curve, model, time=53000.)
See the Reference / API page for a list of all of the built-in methods, or the notebooks that were used to make figures for Boone et al. 2021 for examples.
Classifying light curves
To classify light curves, we first need to predict their representations using a ParSNIP
model. This can be done either with the parsnip_predict
script described previously or
by operating in memory on an lcdata
Dataset object:
>>> predictions = model.predict_dataset(dataset)
>>> print(predictions)
object_id ra dec ... s3 s3_error
--------- -------- -------- ... ------------- -----------
PS0909006 333.9503 1.1848 ... 0.19424233 0.4474311
PS0909010 37.1182 -4.0789 ... -0.40881702 0.59658796
PS0910012 52.4718 -28.0867 ... -2.142636 0.08176677
PS0910016 35.3073 -3.91 ... -0.31671444 0.5740286
... ... ... ... ... ...
A classifier can be trained on a set of predictions with:
>>> classifier = parsnip.Classifier()
>>> classifier.train(predictions)
The classifier can the be used to generate predictions for a new dataset with:
>>> classifier.predict(new_predictions)
object_id SLSN SNII SNIIn SNIa SNIbc
--------- ----- ----- ----- ----- -----
PS0909006 0.009 0.025 0.031 0.858 0.077
PS0909010 0.001 0.002 0.017 0.954 0.024
PS0910016 0.002 0.002 0.017 0.948 0.032
PSc000001 0.003 0.936 0.038 0.003 0.021
PSc090022 0.960 0.001 0.037 0.001 0.000
... ... ... ... ... ...
For more details and examples, see the classification demo notebook.