ahh good old manual fine tuning and maintenance. We are adding data contracts for things like event ingeston where schema needs to be strict or cases where you know ahead of time what to expect.
Our experience comes from startups that usually do not have time to track down the knowledge and rather go out and find/make their own. Here you definitely want evolution with alerts before curation - so load to raw, and curate from there. Picking out data out of something without a schema is called "schema on read" and you can read about its shortcomings. So this is both robust and practical.
For the fine tuning, as I mentioned, data contracts are a PR review and some tweaks away. They will be highly configurable between strict, rule based evolution, or free evolution. Definitely use alerts for curation of evolution events!
Fair enough, especially if explicit alerting is involved.
Have you considered a hybrid solution, something that generates a contract from a large corpus of data, which can then be deployed statically?
I consider "responding to change" as a somewhat different scenario from "heterogeneous but not changing". So statically generating a contract from an existing corpus supports the latter.
I could also envision some kind of graceful degradation, where you have a static contract, but you have dynamic adjustments instead of outright failures if the data does not conform to that contract.
I worked with dlt guys on exactly that. Using OpenAI functions to generate a schema for the data based on the raw data structure. You can check that work here: https://github.com/topoteretes/PromethAI-Memory
It's in the level 1 folder
Our experience comes from startups that usually do not have time to track down the knowledge and rather go out and find/make their own. Here you definitely want evolution with alerts before curation - so load to raw, and curate from there. Picking out data out of something without a schema is called "schema on read" and you can read about its shortcomings. So this is both robust and practical.
For the fine tuning, as I mentioned, data contracts are a PR review and some tweaks away. They will be highly configurable between strict, rule based evolution, or free evolution. Definitely use alerts for curation of evolution events!