# Fault Tolerance
Flinkās fault tolerance mechanism recovers programs in the presence of failures and continues to execute them. Such failures include machine hardware failures, network failures, transient program failures, etc.
## Batch Processing Fault Tolerance (DataSet API)
Fault tolerance for programs in the _DataSet API_ works by retrying failed executions. The number of time that Flink retries the execution before the job is declared as failed is configurable via the _execution retries_ parameter. A value of _0_ effectively means that fault tolerance is deactivated.
To activate the fault tolerance, set the _execution retries_ to a value larger than zero. A common choice is a value of three.
This example shows how to configure the execution retries for a Flink DataSet program.
You can also define default values for the number of execution retries and the retry delay in the `flink-conf.yaml`:
## Retry Delays
Execution retries can be configured to be delayed. Delaying the retry means that after a failed execution, the re-execution does not start immediately, but only after a certain delay.
Delaying the retries can be helpful when the program interacts with external systems where for example connections or pending transactions should reach a timeout before re-execution is attempted.
You can set the retry delay for each program as follows (the sample shows the DataStream API - the DataSet API works similarly):
You can also define the default value for the retry delay in the `flink-conf.yaml`: