Create your ZTF filter
This tutorial goes step-by-step for creating a filters used to define which information will be sent to you by the broker. It is expected that you know the basics of Python and Pandas. The use of Apache Spark is a plus. If you are not at ease with software development, that is also fine! Just contact us with your scientific idea, and we will help you designing the filter.
Running entirely Fink just for testing a module might be an overwhelming task. Fink can be a complex system, but hopefully it is highly modular such that you do not need all the parts to test one part in particular. In principle, to test a module you only need Apache Spark installed, and alert data. Spark API exposes nearly the same methods for static or streaming DataFrame. Hence, to avoid complication due to streaming (e.g. creating streams with Kafka, reading streams, managing offsets, etc...), it is always best to prototype on static DataFrame. If the logic works for static, it will work for streaming.
Development environment
First fork and clone the fink-filters repository on your machine, and create a new folder in fink_filters/
. The name of the folder does not matter much, but try to make it meaningful as much as possible!
To make sure you are working in the correct environment, with exact version of dependencies used by Fink, we recommend to use the Fink Docker image. Download the image and mount your version of fink-filters in a container:
# 2.3GB compressed
docker pull julienpeloton/fink-ci:latest
# Assuming you are in /path/to/fink-filters
docker run -t -i --rm -v \
$PWD:/home/libs/fink-filters \ # (1)!
julienpeloton/fink-ci:latest bash
- Mount a volume for persisting data generated by and used by you in the Docker container.
The advantage of this method is that you have everything installed in it (Python and various frameworks). Beware, it is quite big... You should see some logs appearing when entering the container (this is ok). Finally activate the environment and remove the pre-installed version of fink-filters
from the container:
Filter design
A filter is typically a Python routine that selects which alerts need to be sent based on user-defined criteria. Criteria are based on the alert entries: position, flux, properties, ... plus all the added-values from the Fink science modules. We recommend users to check schemas before starting.
A filter will typically contains two parts: the filter module that contains the main routine called by Fink, and any other modules used by the filter:
.
├── fink_filter
│ ├── filter_dyson_sphere
│ │ ├── __init__.py
│ │ ├── filter.py # (1)!
│ │ └── mymodule.py
- The filename
filter.py
is mandatory. All the remaining files or folders can have any names.
A full example of a filter can be found at https://github.com/astrolabsoftware/fink-filters/tree/master/tutorial where we focus on alerts with a counterpart in the SIMBAD database and a magnitude above 20.5.
Filter test
The more tests the better! Typically, we expect at least unit tests using doctest for all functions (see an example here). Once you have written your unit tests, you can easily run them:
# in /home/libs/fink-filters
./run_tests.sh --single_module fink_filter/filter_dyson_sphere/filter.py
you should see some harmless Spark logs in the form
/spark-3.4.1-bin-hadoop3/python/pyspark/sql/pandas/functions.py:399: UserWarning:
In Python 3.6+ and Spark 3.0+, it is preferred to specify type hints for pandas UDF
instead of specifying pandas UDF type which will be deprecated in the future releases.
See SPARK-28264 for more details.
then if you do not have errors, you will see the coverage report:
Combined data file .coverage.peloton.494915.XGtzWZRx
...
...
Name Stmts Miss Cover Missing
-----------------------------------------------------------------------------------------
...
fink_science/filter_dyson_sphere/filter.py 44 6 86% 39-42, 59-61
...
Need more representative test data?
There is test data in fink-filters already, but it might not be enough representative of your science case. In that case, the best is to use the Data Transfer service to get tailored data for your test.
Authentication
Make sure you have an account to use the fink-client.
Once you have an account, install it and register your credentials on the container:
# Install the client
pip install fink-client
# register using your credentials
fink_client_register ...
Trigger a job on the Data Transfer service and download data in your container (July 12 2024 is good to start, only 17k alerts):
# Change accordingly
TOPIC=ftransfer_ztf_2024-07-16_682277
mkdir -p /data/$TOPIC
fink_datatransfer \
-topic $TOPIC \
-outdir /data/$TOPIC \
-partitionby finkclass \
--verbose
and specify this data path in your test:
# usually at the end of filter.py
...
if __name__ == "__main__":
""" Execute the test suite """
globs = globals()
custom_path = "file:///data/ftransfer_ztf_2024-07-16_682277"
globs["custom_path"] = custom_path
...
Submit your Filter
Once you are ready (either the filter is done, or you are stuck and want help), open a Pull Request on the fink-filters repository on GitHub, and we will review the filter and test it extensively before deployment.
Once your filter is accepted and deployed, you will then be able to receive these alerts in (near) real-time using the fink-client, or access them at any time in the Science Portal.