Appendix A — create_dag Arguments
Both create_dag
and create_dags
can take any keyword arguments available to Airflow’s DAG object. Additionally, there are some gusty-specific arguments for these functions.
Below we will cover all gusty-specific arguments available in create_dag
and create_dags
, followed by specific create_dag
and create_dags
considerations. The gusty-specific arguments can also be used in a DAG’s METADATA.yml
.
For the best results, it’s recommended to always use keyword arguments with create_dag
and create_dags
.
A.1 gusty-specific arguments
latest_only
By default, gusty adds a LatestOnlyOperator
at the absolute root of your Airflow DAG, which means that - by default - the tasks is your DAG will not run except for the latest DAG run. You can read more about the LatestOnlyOperator
in Airflow’s documentation, but setting latest_only=False
will ensure a gusty-generated DAG mirrors Airflow’s default behavior.
root_tasks
You can assign certain tasks to be at the beginning of the DAG by declaring root_tasks
, a list of task ids. Any task id that is designated as a root task cannot have a dependencies
block.
leaf_tasks
You can assign certain tasks to be at the end of the DAG by declaring leaf_tasks
, a list of task ids. Any task id that is designated as a leaf task cannot have a dependencies
block.
external_dependencies
A list of key value pairs in the format of dag_id: task_id
, where the dag_id
is some upstream DAG and the task_id
is the task in that upstream DAG. When set, gusty will create ExternalTaskSensor tasks and place them at the root of the DAG. Set the task_id
to all
to wait for the entire upstream DAG to complete. See the section on external dependencies for more details.
dag_constructors
Provide either a list of functions or a dictionary of function names names and functions (much like what you would pass to an Airflow DAG’s user_defined_macros
) to have your functions available to you both as YAML constructors with gusty as well as in Airflow anywhere user_defined_macros
are accepted.
gusty will consolidate your user_defined_macros
and your dag_constructors
so that all are available anywhere you’d expect. Really, you can just use the Airflow DAG object’s user_defined_macros
for everything.
list format
The list format for dag_constructors
would look like this:
=[your_first_func, your_second_func] dag_constructors
The functions would be accessible based on their function name.
dictionary format
The dictionary format for dag_constructors would look like this:
={
dag_constructors"your_first_func": your_first_func,
"your_renamed_func": your_second_func
}
The functions would be accessible by the key name, allowing you to - as illustrated above - renamed your functions if you so desire.
Again, you can just use Airflow’s built-in user_defined_macros
argument to achieve this same functionality, of having your macros available to you anywhere.
wait_for_defaults
A dictionary of values that can be passed to an Airflow ExternalTaskSensor (or BaseOperator).
task_group_defaults
A dictionary of values that can be passed to Airflow TaskGroup object.
leaf_tasks_from_dict
A dictionary of tasks that you want at the end of your DAG, where the key is the name of the task, and the value is a spec for that task.
={
leaf_tasks_from_dict"my_dag_is_done": {
"operator": "airflow.operators.bash.BashOperator",
"bash_command": "echo done"
} }
parse_hooks
If you want to parse another file type, or want to override how gusty parses supported file types, you can pass a dictionary of file extensions and functions to parse those extensions. Your functions should take a file_path
argument.
={
parse_hooks".sh": your_shell_file_parsing_function
}
See gusty’s built-in parsers here.
ignore_subfolders
Will disable the creation of task groups from subfolders when set to True
.
(Note that if you only want to ignore some subfolders, you can add a file called .gustyignore
to those the subfolders you would like ignored.)
render_on_create
Disabled by default. If you want any Jinja in your spec to rendered on creation, set to True
. Note that this will process everything every time the DAG is processed, which by default in Airflow is every few minutes. In general you don’t want this on.
A.2 create_dag
Specific Notes
The first argument to create_dag
is a path to single DAG directory containing Task Definition Files.
A.3 create_dags
Specific Notes
The first argument to create_dags
is a path to a directory containing multiple DAG directories, each with their own Task Definition Files.
The second argument to create_dag
should always be globals()
, which will ensure the resulting DAG objects are discoverable by Airflow.