Since dlt generates a schema, and tracks evolution etc, contains lineage, and follows data vault standard it can easily provide metdata or lineage info to the other tools.
At the same time, dlt is a pipeline building tool first - so if people want to read metadata from somewhere and store it elsewhere, they can.
If you mean to take metadata like we integrate with arrow - that remains to be seen if the community might want this or find it useful, we will not develop plugins for collecting cobwebs, but if there are interested users we will add it to our backlog.
Thanks for the response.
I also noticed there was a mention of data contracts or Pydantic to keep your data clean. Would it make sense to embed that as part of a DLT pipeline or is the recommendation to include it as part of the transformation step?
We have a PR (https://github.com/dlt-hub/dlt/pull/594) that is about to merge that makes the above highly configurable, between evolution and hard stopping:
- you will be able to totally freeze schema and reject bad rows
- or accept the data for existing columns but not new columns
- or accept some fields based on rules'
At the same time, dlt is a pipeline building tool first - so if people want to read metadata from somewhere and store it elsewhere, they can.
If you mean to take metadata like we integrate with arrow - that remains to be seen if the community might want this or find it useful, we will not develop plugins for collecting cobwebs, but if there are interested users we will add it to our backlog.