This issue stemmed from one primary and one secondary cause.
The primary cause was that the workflows, both nightly and backfill, had been modified to use a feature built into Airflow for waiting a specified time before progressing the workflow. That mechanism appeared to be basing its start time at the beginning of the workflow rather than after the previous task. Thus, we switched back to using a prior mechanism of time.sleep
in a wait
task. This reversion was incomplete, though, and did not include the trigger rule allowing it to always run when all prior tasks were complete, regardless of their final disposition.
The secondary issue that we addressed first involves a mechanism of deployment.
This has been fixed. Further investigation as to why we had failures, and notifying upon failure, and addressing the deployment issue, is under way but does not impact this incident.