Appendix A — create_dag Arguments

Both create_dag and create_dags can take any keyword arguments available to Airflow’s DAG object. Additionally, there are some gusty-specific arguments for these functions.

Below we will cover all gusty-specific arguments available in create_dag and create_dags, followed by specific create_dag and create_dags considerations. The gusty-specific arguments can also be used in a DAG’s METADATA.yml.

For the best results, it’s recommended to always use keyword arguments with create_dag and create_dags.

A.1 gusty-specific arguments

latest_only

By default, gusty adds a LatestOnlyOperator at the absolute root of your Airflow DAG, which means that - by default - the tasks is your DAG will not run except for the latest DAG run. You can read more about the LatestOnlyOperator in Airflow’s documentation, but setting latest_only=False will ensure a gusty-generated DAG mirrors Airflow’s default behavior.

extra_tags

In addition to any tags set via an Airflow DAG’s tags argument (available - as with any Airflow DAG parameter - in both create_dag and METADATA.yml), gusty will append any tags set in the extra_tags list to the provided tags.

To set extra_tags in your call to create_dag, provide a list like so:

extra_tags=["your", "extra", "tags"]

root_tasks

You can assign certain tasks to be at the beginning of the DAG by declaring root_tasks, a list of task ids. Any task id that is designated as a root task cannot have a dependencies block.

leaf_tasks

You can assign certain tasks to be at the end of the DAG by declaring leaf_tasks, a list of task ids. Any task id that is designated as a leaf task cannot have a dependencies block.

external_dependencies

A list of key value pairs in the format of dag_id: task_id, where the dag_id is some upstream DAG and the task_id is the task in that upstream DAG. When set, gusty will create ExternalTaskSensor tasks and place them at the root of the DAG. Set the task_id to all to wait for the entire upstream DAG to complete. See the section on external dependencies for more details.

dag_constructors

Provide either a list of functions or a dictionary of function names names and functions (much like what you would pass to an Airflow DAG’s user_defined_macros) to have your functions available to you both as YAML constructors with gusty as well as in Airflow anywhere user_defined_macros are accepted.

gusty will consolidate your user_defined_macros and your dag_constructors so that all are available anywhere you’d expect. Really, you can just use the Airflow DAG object’s user_defined_macros for everything.

list format

The list format for dag_constructors would look like this:

dag_constructors=[your_first_func, your_second_func]

The functions would be accessible based on their function name.

dictionary format

The dictionary format for dag_constructors would look like this:

dag_constructors={
  "your_first_func": your_first_func,
  "your_renamed_func": your_second_func
  }

The functions would be accessible by the key name, allowing you to - as illustrated above - renamed your functions if you so desire.

Again, you can just use Airflow’s built-in user_defined_macros argument to achieve this same functionality, of having your macros available to you anywhere.

wait_for_defaults

A dictionary of values that can be passed to an Airflow ExternalTaskSensor (or BaseOperator).

task_group_defaults

A dictionary of values that can be passed to Airflow TaskGroup object.

leaf_tasks_from_dict

A dictionary of tasks that you want at the end of your DAG, where the key is the name of the task, and the value is a spec for that task.

leaf_tasks_from_dict={
  "my_dag_is_done": {
    "operator": "airflow.operators.bash.BashOperator",
    "bash_command": "echo done"
    }
}

parse_hooks

If you want to parse another file type, or want to override how gusty parses supported file types, you can pass a dictionary of file extensions and functions to parse those extensions. Your functions should take a file_path argument.

parse_hooks={
  ".sh": your_shell_file_parsing_function
}

See gusty’s built-in parsers here.

ignore_subfolders

Will disable the creation of task groups from subfolders when set to True.

(Note that if you only want to ignore some subfolders, you can add a file called .gustyignore to those the subfolders you would like ignored.)

render_on_create

Disabled by default. If you want any Jinja in your spec to rendered on creation, set to True. Note that this will process everything every time the DAG is processed, which by default in Airflow is every few minutes. In general you don’t want this on.

A.2 create_dag Specific Notes

The first argument to create_dag is a path to single DAG directory containing Task Definition Files.

A.3 create_dags Specific Notes

The first argument to create_dags is a path to a directory containing multiple DAG directories, each with their own Task Definition Files.

The second argument to create_dag should always be globals(), which will ensure the resulting DAG objects are discoverable by Airflow.