Technological consideration
Technological goal
On the technological front, Fink is dedicated to providing a robust infrastructure and cutting-edge streaming services for Rubin scientists, enabling seamless user-defined science cases within a big data context.
Driven by large-scale optical surveys such as the Zwicky Transient Facility and Rubin Observatory, Fink operates on large scientific cloud infrastructures (VirtualData at Paris-Saclay, and CC-IN2P3), and it is based on several established bricks such as Apache Spark, Apache Kafka and Apache HBase.
Programming languages
The primary language chosen for most APIs is Python, which is widely used in the astronomy community, has a large scientific ecosystem, and easily integrates with existing tools. However, under the hood, Fink utilizes several other languages, including Scala, Java, and Rust. Modern codebases often require a variety of programming languages!
Fink is mainly based on the recent Spark Structured Streaming module introduced in Spark 2.0 (see paper), and especially its integration with Apache Kafka (see here). Structured streaming is a stream processing engine built on the Spark SQL engine, hence it combines the best of the two worlds. The idea behind it is to process data streams as a series of small batch jobs, called micro-batch processing. As anything in Spark, it provides fast, scalable, fault-tolerant processing, plus end-to-end exactly-once stream processing.
Broker structure
Fink broker structure
The broker is made of 4 modules:
- stream2raw: connect to incoming stream of alerts, and archive data on disk.
- raw2science: filter out bad quality alerts, and add values to remaining alerts using the user-defined science modules.
- distribution: redistribute alerts to users based on user-defined filters (Kafka topics).
- archive: store alerts containing scientific added values.
You can install and test all of these components in local mode, with moderate resources required (see testing Fink).
Can Fink do everything?
While many analyses can be conducted end-to-end within Fink, we do not always provide all the necessary components for a complete analysis. This may be due to limitations in our expertise or roadmap, the substantial effort required for integration, or the need for proprietary access to certain external data that we do not possess. Instead, we offer interoperable tools that allow you to export enriched data for further analysis elsewhere. In practice, this is where plateform such as AstroColibri, observation managers such as TOMs and marshals such as SkyPortal, come into play. These tools, which have interfaces developed for most brokers, facilitate the coordination of follow-up observations and additional scientific analyses after we enrich the data.