Automatic retry

When a client gets an error response, it might want to retry the request depending on the response. This can be accomplished using a decorator, and Armeria provides the following implementations out-of-the box.

RetryingClient
RetryingRpcClient

Both behave the same except for the different request and response types. So, let's find out what we can do with RetryingClient.

`RetryingClient`

You can just use the decorator() method in ClientBuilder or WebClientBuilder to build a RetryingClient. For example:

import com.linecorp.armeria.client.WebClient;
import com.linecorp.armeria.client.retry.RetryingClient;
import com.linecorp.armeria.client.retry.RetryRule;
import com.linecorp.armeria.common.AggregatedHttpResponse;

RetryRule rule = RetryRule.failsafe();
WebClient client = WebClient.builder("http://example.com/hello")
                            .decorator(RetryingClient.newDecorator(rule))
                            .build();

AggregatedHttpResponse res = client.execute(...).aggregate().join();

That's it. The client will keep attempting until it succeeds or the number of attempts exceeds the maximum number of total attempts. You can configure the maxTotalAttempts when making the decorator using RetryingClient.newDecorator(). Meanwhile, the rule will decide to retry depending on the response. In this case, the client retries when it receives 5xx response error or an exception is raised.

`RetryRule`

You can fluently build your own RetryRule.

import com.linecorp.armeria.client.ResponseTimeoutException;
import com.linecorp.armeria.common.HttpStatus;

Backoff myBackoff = ...;
RetryRule.of(RetryRule.builder().onUnProcessed().thenBackoff(myBackoff),
             RetryRule.builder().onException(ResponseTimeoutException.class).thenBackoff(),
             RetryRule.builder().onStatus(HttpStatus.TOO_MANY_REQUESTS).thenNoRetry())

Or you can customize the rule by implementing RetryRule.

import com.linecorp.armeria.client.ClientRequestContext;
import com.linecorp.armeria.client.UnprocessedRequestException;
import com.linecorp.armeria.client.retry.Backoff;
import com.linecorp.armeria.client.retry.RetryDecision;
import com.linecorp.armeria.common.ResponseHeaders;
import com.linecorp.armeria.common.logging.RequestLogProperty;

new RetryRule() {
    Backoff backoff = Backoff.ofDefault();

    @Override
    public CompletionStage<RetryDecision> shouldRetry(ClientRequestContext ctx,
                                                      @Nullable Throwable cause) {
        if (cause != null) {
            if (cause instanceof ResponseTimeoutException ||
                cause instanceof UnprocessedRequestException) {
                // The response timed out or the request has not been handled
                // by the server.
                return UnmodifiableFuture.completedFuture(RetryDecision.retry(backoff));
            }
        }

        ResponseHeaders responseHeaders = ctx.log().ensureAvailable(RequestLogProperty.RESPONSE_HEADERS)
                                             .responseHeaders();
        if (responseHeaders.status() == HttpStatus.TOO_MANY_REQUESTS) {
            return UnmodifiableFuture.completedFuture(RetryDecision.stop());
        }

        // Return 'next()' to lookup other rules.
        return UnmodifiableFuture.completedFuture(RetryDecision.next());
    }
};

This will retry when one of ResponseTimeoutException and UnprocessedRequestException is raised. However, if the response's status is 429 Too Many Requests, it will stop retrying. For all other cases, it will defer to the next rule if one exists.

warning

We declare a Backoff as a member and reuse it when a rule returns it, so that we do not return a different Backoff instance for each shouldRetry(). RetryingClient internally tracks the reference of the returned Backoff and increases the counter that keeps the number of attempts made so far, and resets it to 0 when the Backoff returned by the retry rule is not the same as before. Therefore, it is important to return the same Backoff instance unless you decided to change your Backoff strategy. If you do not return the same one, when the Backoff yields a different delay based on the number of retries, such as an exponential backoff, it will not work as expected. We will take a close look into a Backoff at the next section.

tip

UnprocessedRequestException literally means that the request has not been processed by the server. Therefore, you can safely retry the request without worrying about the idempotency of the request. For more information about idempotency, please refer to What are idempotent and/or safe methods?.

You can return a different Backoff according to the response status.

import com.linecorp.armeria.common.HttpStatusClass;

Backoff backoffOnServerErrorOrTimeout = Backoff.ofDefault();
Backoff backoffOnConflict = Backoff.fixed(100);
RetryRule.builder()
         .onException(ex -> ex instanceof ResponseTimeoutException ||
                            ex instanceof UnprocessedRequestException)
         .thenBackoff(backoffOnServerErrorOrTimeout)
         .orElse(RetryRule.builder()
                          .onStatusClass(HttpStatusClass.SERVER_ERROR)
                          .thenBackoff(backoffOnServerErrorOrTimeout))
         .orElse(RetryRule.builder()
                          .onStatus(HttpStatus.CONFLICT)
                          .thenBackoff(backoffOnConflict));

If you need to determine whether you need to retry by looking into the response content, you should implement RetryRuleWithContent and specify it when you create a WebClient using RetryingClientBuilder:

import com.linecorp.armeria.client.retry.RetryRuleWithContent;

RetryRuleWithContent<HttpResponse> retryRule =
        RetryRuleWithContent
                .<HttpResponse>builder()
                .onException(ex -> ex instanceof ResponseTimeoutException ||
                                   ex instanceof UnprocessedRequestException)
                .onResponse(response -> {
                    return response.aggregate()
                                   .thenApply(content -> "Should I retry?".equals(content.contentUtf8()));
                })
                .thenBackoff(backoff);

// Create a WebClient with a retry rule.
WebClient client = WebClient
        .builder(...)
        .decorator(RetryingClient.builder(retryRule)
                                 .newDecorator())
        .build();

AggregatedHttpResponse res = client.execute(...).aggregate().join();

tip

You might find the Exceptions.peel() method useful when the exception you are trying to handle is wrapped by exceptions like CompletionException and ExecutionException:

import com.linecorp.armeria.common.Exceptions;

@Override
public CompletionStage<RetryDecision> shouldRetry(ClientRequestContext ctx,
                                                  @Nullable Throwable cause) {
    if (cause != null) {
        if (cause instanceof ResponseTimeoutException ||
            cause instanceof UnprocessedRequestException) {
            // The response timed out or the request has not been handled
            // by the server.
            return UnmodifiableFuture.completedFuture(backoff);
        }

        Throwable peeled = Exceptions.peel(cause);
        if (peeled instanceof MyException) { ... }
    }
    ...
}

`Backoff`

You can use a Backoff to determine the delay between attempts. Armeria provides Backoff implementations which produce the following delays out of the box:

Fixed delay, created with Backoff.fixed()
Random delay, created with Backoff.random()
Exponential delay which is multiplied on each attempt, created with Backoff.exponential()

Armeria provides Backoff.ofDefault() that you might use by default. It is exactly the same as:

Backoff.exponential(200   /* minDelayMillis */,
                    10000 /* maxDelayMillis */,
                    2.0   /* multiplier     */)
       .withJitter(0.2 /* jitterRate */);

The delay starts from minDelayMillis until it reaches maxDelayMillis multiplying by multiplier every retry. Please note that the Backoff.withJitter() will add jitter value to the calculated delay.

For more information, please refer to the API documentation of the com.linecorp.armeria.client.retry package.

`maxTotalAttempts` vs per-Backoff `maxAttempts`

If you create a Backoff using Backoff.withMaxAttempts() in a RetryRule, the RetryingClient which uses the RetryRule will stop retrying when the number of attempts passed maxAttempts. However, if you have more than one Backoff and return one after the other continuously, it will keep retrying over and over again because the counter that RetryingClient internally tracks is initialized every time the different Backoff is returned. To limit the number of attempts in a whole retry session, RetryingClient limits the maximum number of total attempts to 10 by default. You can change this value by specifying maxTotalAttempts when you build a RetryingClient:

RetryConfig config = RetryConfig.builder(rule)
    .maxTotalAttempts(maxTotalAttempts)
    .build();
RetryingClient.newDecorator(config);

Or, you can override the default value of 10 using the JVM system property -Dcom.linecorp.armeria.defaultMaxTotalAttempts=<integer>.

Note that when a RetryingClient stops due to the attempts limit, the client will get the last received Response from the server.

Per-attempt timeout

ResponseTimeoutException can occur in two different situations while retrying. First, it occurs when the time of whole retry session has passed the time previously configured using:

ClientBuilder.responseTimeoutMillis(millis);
// or..
ClientRequestContext.setResponseTimeoutAfterMillis(millis);

You cannot retry on this ResponseTimeoutException. Second, it occurs when the time of individual attempt in retry has passed the time which is per-attempt timeout. You can configure it when you create the decorator:

RetryConfig config = RetryConfig.builder(rule)
    .maxTotalAttempts(maxTotalAttempts)
    .responseTimeoutMillisForEachAttempt(responseTimeoutMillisForEachAttempt)
    .build();
RetryingClient.newDecorator(config);

You can retry on this ResponseTimeoutException.

For example, when making a retrying request to an unresponsive service with responseTimeoutMillis = 10,000, responseTimeoutMillisForEachAttempt = 3,000 and disabled Backoff, the first three attempts will be timed out by the per-attempt timeout (3,000ms). The 4th one will be aborted after 1,000ms since the request session has reached at 10,000ms before it is timed out by the per-attempt timeout.

0ms         3,000ms     6,000ms     9,000ms
|           |           |           |
+-----------+-----------+-----------+----+
| Attempt 1 | Attempt 2 | Attempt 3 | A4 |
+-----------+-----------+-----------+----+
                                         |
                                       10,000ms (ResponseTimeoutException)

In the example above, every attempt is made before it is timed out because the Backoff is disabled. However, what if a Backoff is enabled and the moment of trying next attempt is after the point of ResponseTimeoutException? In such a case, the RetryingClient does not schedule for the next attempt, but finishes the retry session immediately with the last received Response. Consider the following example:

0ms         3,000ms     6,000ms     9,000ms     12,000ms
|           |           |           |           |
+-----------+-----------+-----------+-----------+-----------------------+
| Attempt 1 |           | Attempt 2 |           | Attempt 3 is not made |
+-----------+-----------+-----------+----+------+-----------------------+
                                    |    |
                                    | 10,000ms (retry session deadline)
                                    |
                                stops retrying at this point

Unlike the example above, the Backoff is enabled and it makes the RetryingClient perform retries with 3-second delay. When the second attempt is finished at 9,000ms, the next attempt will be at 12,000ms exceeding the response timeout of 10,000ms. The RetryingClient, at this point, stops retrying and finished the retry session with the last received Response, retrieved at 9,000ms from the attempt 2.

`RetryingClient` with logging

You can use RetryingClient with LoggingClient to log. If you want to log all of the requests and responses, decorate LoggingClient with RetryingClient. That is:

RetryRule rule = RetryRule.failsafe();
WebClient client = WebClient.builder(...)
                            .decorator(LoggingClient.newDecorator())
                            .decorator(RetryingClient.newDecorator(rule))
                            .build();

This will produce following logs when there are three attempts:

Request: {startTime=..., length=..., duration=..., scheme=..., host=..., headers=[...]
Response: {startTime=..., length=..., duration=..., headers=[:status=500, ...]
Request: {startTime=..., ..., headers=[..., armeria-retry-count=1, ...]
Response: {startTime=..., length=..., duration=..., headers=[:status=500, ...]
Request: {startTime=..., ..., headers=[..., armeria-retry-count=2, ...]
Response: {startTime=..., length=..., duration=..., headers=[:status=200, ...]

tip

Did you notice that the armeria-retry-count header is inserted from the second request? RetryingClient inserts it to indicate the retry count of a request. The server might use this value to reject excessive retries, etc.

If you want to log the first request and the last response, no matter if it's successful or not, do the reverse:

import com.linecorp.armeria.client.logging.LoggingClient;

RetryRule rule = RetryRule.failsafe();
// Note the order of decoration.
WebClient client = WebClient.builder(...)
                            .decorator(RetryingClient.newDecorator(rule))
                            .decorator(LoggingClient.newDecorator())
                            .build();

This will produce single request and response log pair and the total number of attempts only, regardless how many attempts are made:

Request: {startTime=..., length=..., duration=..., scheme=..., host=..., headers=[...]
Response: {startTime=..., length=..., headers=[:status=200, ...]}, {totalAttempts=3}

tip

Please refer to Nested log, if you are curious about how this works internally.

`RetryingClient` with circuit breaker

You might want to use Circuit breaker with RetryingClient using Decorating a client:

import com.linecorp.armeria.client.circuitbreaker.CircuitBreakerRule;
import com.linecorp.armeria.client.circuitbreaker.CircuitBreakerClientBuilder;

CircuitBreakerRule cbRule = CircuitBreakerRule.onServerErrorStatus();
RetryRule myRetryRule = RetryRule.builder()
                                 ...
                                 .build();

WebClient client = WebClient.builder(...)
                            .decorator(CircuitBreakerClient.builder(cbRule)
                                                           .newDecorator())
                            .decorator(RetryingClient.builder(myRetryRule)
                                                     .newDecorator())
                            .build();

AggregatedHttpResponse res = client.execute(...).aggregate().join();

This decorates CircuitBreakerClient with RetryingClient so that the CircuitBreaker judges every request and retried request as successful or failed. If the failure rate exceeds a certain threshold, it raises a FailFastException. When using both clients, you need to build a custom RetryRule to handle this exception so that the RetryingClient does not attempt a retry unnecessarily when the circuit is open, e.g.

import com.linecorp.armeria.client.circuitbreaker.FailFastException;

RetryRule.of(RetryRule.builder()
                      // The circuit is already open so stops retrying.
                      .onException(FailFastException.class)
                      .thenNoRetry(),
             RetryRule.builder()
                      .onException(ex -> ex instanceof ResponseTimeoutException ||
                                         ex instanceof UnprocessedRequestException)
                      .thenBackoff(),
             // Implement the rest of your own rule.
             ...);

tip

You may want to allow retrying even on FailFastException when your endpoint is configured with client-side load balancing because the next attempt might be sent to the next available endpoint. See Client-side load balancing and service discovery for more information.

RetryingClient​

RetryRule​

Backoff​

maxTotalAttempts vs per-Backoff maxAttempts​

Per-attempt timeout​

RetryingClient with logging​

RetryingClient with circuit breaker​

See also​