Sunday, October 22, 2017

Raw performance numbers - Spring Boot 2 Webflux vs Spring Boot 1


Spring Boot 2 with Spring Webflux based application outperforms a Spring Boot 1 based application by a huge margin for IO heavy workloads. The following is a summarized result of a load test - Response time for a IO heavy transaction with varying concurrent users:

When the number of concurrent users remains low (say less than 1000) both Spring Boot 1 and Spring Boot 2 handle the load well and the 95 percentile response time remains milliseconds above a expected value of 300 ms.

At higher concurrency levels, the Async Non-Blocking IO and reactive support in Spring Boot 2 starts showing its colors - the 95th percentile time even with a very heavy load of 5000 users remains at around 312ms! Spring Boot 1 records a lot of failures and high response times at these concurrency levels.


My set-up for the performance test is the following:

The sample applications expose an endpoint(/passthrough/message) which in-turn calls a downstream service. The request message to the endpoint looks something like this:

  "id": "1",
  "payload": "sample payload",
  "delay": 3000

The downstream service would delay based on the "delay" attribute in the message (in milliseconds).

Spring Boot 1 Application

I have used Spring Boot 1.5.8.RELEASE for the Boot 1 version of the application. The endpoint is a simple Spring MVC controller which in turn uses Spring's RestTemplate to make the downstream call. Everything is synchronous and blocking and I have used the default embedded Tomcat container as the runtime. This is the raw code for the downstream call:

public MessageAck handlePassthrough(Message message) {
    ResponseEntity<MessageAck> responseEntity = this.restTemplate.postForEntity(targetHost 
                                                            + "/messages", message, MessageAck.class);
    return responseEntity.getBody();

Spring Boot 2 Application

Spring Boot 2 version of the application exposes a Spring Webflux based endpoint and uses WebClient, the new non-blocking, reactive alternate to RestTemplate to make the downstream call - I have also used Kotlin for the implementation, which has no bearing on the performance. The runtime server is Netty:

import org.springframework.http.HttpHeaders
import org.springframework.http.MediaType
import org.springframework.web.reactive.function.BodyInserters.fromObject
import org.springframework.web.reactive.function.client.ClientResponse
import org.springframework.web.reactive.function.client.WebClient
import org.springframework.web.reactive.function.client.bodyToMono
import org.springframework.web.reactive.function.server.ServerRequest
import org.springframework.web.reactive.function.server.ServerResponse
import org.springframework.web.reactive.function.server.bodyToMono
import reactor.core.publisher.Mono

class PassThroughHandler(private val webClient: WebClient) {

    fun handle(serverRequest: ServerRequest): Mono<ServerResponse> {
        val messageMono = serverRequest.bodyToMono<Message>()

        return messageMono.flatMap { message ->
                    .flatMap { messageAck ->

    fun passThrough(message: Message): Mono<MessageAck> {
                .header(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
                .header(HttpHeaders.ACCEPT, MediaType.APPLICATION_JSON_VALUE)
                .flatMap { response: ClientResponse ->

Details of the Perfomance Test

The test is simple, for different sets of concurrent users (300, 1000, 1500, 3000, 5000), I send a message with the delay attribute set to 300 ms, each user repeats the scenario 30 times with a delay between 1 to 2 seconds between requests. I am using the excellent Gatling tool to generate this load.


These are the results as captured by Gatling:

300 concurrent users:
Boot 1 Boot 2

1000 concurrent users:
Boot 1 Boot 2

1500 concurrent users:
Boot 1 Boot 2

3000 concurrent users:
Boot 1 Boot 2

5000 concurrent users:
Boot 1 Boot 2


The sample application and the load scripts are available in my github repo -


  1. It's not fair since spring boot 1 has embedded tomcat which is not tuned.

    1. First, who tunes tomcat. Second, will that matter? is Netty tuned?

  2. I like the post, but tried to run this on my local computer, and looks like my system is the bottleneck in Spring Boot 2 solution when I am testing above 1500 concurrent users. Not sure why, but my CPU load is limited to 50%, while running the Gatling. Is there some limitations etc., that's are using only 50% of each core power? My CPU is 6-cores i7-4930K @ 3.40GHz. Gatling is using all 12 logical cores, but do not exceeds 50% at each single core.