Stefan KrawczykCo-creator of Hamilton
|
=> Download presentation material
Feature engineering in Python with Hamilton: Portability & Lineage
In the field of feature engineering, practitioners regularly have to build, manage, and iterate on batch jobs to create inputs to models. Then, odds are, that they then have to translate the logic from those feature engineering jobs into logic that runs on a web-service or streaming setting to process data for making fresh predictions. This process can be an engineering headache, make it tough for teams to collaborate, and can result in difficult-to-detect deltas between training and inference, complex code, and highly bespoke infrastructure.
One solution is to use Hamilton. Hamilton is an opinionated lightweight open-source micro-framework in Python, that enables data practitioners to cleanly and portably define dataflows and understand lineage, e.g. one can create feature transforms and understand feature lineage. Hamilton places no restrictions on the nature of transformations, allowing practitioners to use their favorite Python libraries. With Hamilton, you can run the same code in your Airflow DAG for training as you would in your FastAPI service for inference, get the same result, and also get lightweight lineage that aids with reproducibility and collaboration without doing much extra!
In this talk, we will present Hamilton, and talk about how it can enable practitioners and their teams to build highly portable dataflows that can run in a variety of different contexts, as well utilize its lineage capabilities. More specifically we will discuss:
- The paradigm Hamilton introduces, and how it simplifies the process of building and maintaining feature engineering pipelines
- How the same feature transforms can be used in batch data preparation for training and inference, as well as in a web-service or streaming context with minimal changes.
- How Hamilton’s lineage capabilities work and how to utilize them.