Back to Blog

Your Reranker Performance: The Ettin Family Delivers State-of-the-Art Speed

Six new Ettin Reranker Family CrossEncoder models provide state-of-the-art performance, optimizing your search accuracy and infrastructure costs through an efficient distillation recipe.

Admin
May 21, 2026
3 min read
Your Reranker Performance: The Ettin Family Delivers State-of-the-Art Speed
Your Reranker Performance: The Ettin Family Delivers State-of-the-Art Speed

Editorial Note

Reviewed and analysis by ScoRpii Tech Editorial Team.

Introducing the Ettin Reranker Family

You can now integrate advanced reranking capabilities into your existing retrieval augmented generation (RAG) pipelines or semantic search systems with the Ettin Reranker Family. This family comprises six new Sentence Transformers CrossEncoder models, which leverage the Ettin ModernBERT encoders and are the product of a specific distillation recipe. You can deploy these models with just three lines of code, making the integration path straightforward.

The training dataset used for this process was cross-encoder/ettin-reranker-v1-data, and the approach means that the released models remain standard Sentence Transformers CrossEncoder models. The stated goal from the Introducing the Ettin Reranker Family announcement was to provide these models, their training data, and the full recipe that produced them, thereby making high-performance reranking more accessible for your engineering efforts.

Understanding CrossEncoder Reranking

If you're operating complex information retrieval systems, you've likely encountered the challenge of improving search result relevance beyond initial candidate generation. A CrossEncoder reranker addresses this by taking a query and a set of candidate documents, then scoring each (query, document) pair individually. Unlike a Bi-Encoder, which processes query and document separately into independent embedding vectors, a CrossEncoder performs a joint computation.

This allows it to model the direct interaction and contextual relationship between the query and each document, leading to a more nuanced and accurate relevance score. The trade-off for this increased accuracy is typically higher computational cost, as each candidate document requires a separate forward pass with the query through the model. However, for critical stages like the final reranking of a smaller set of top-k candidates, the precision benefits often outweigh the added latency.

Performance Metrics and Operational Impact

The performance gains from this new family of rerankers are notable, particularly concerning computational efficiency. According to the announcement, there is a significant speedup when using bf16 (bfloat16) precision and Flash Attention 2 (FA2) without padding, compared to a baseline using fp32 (float32) precision with standard Self-Attention with Causal Masking (SDPA). The total speedup from bf16+FA2 w.o. padding over the fp32+SDPA baseline grows sharply with model size.

Key performance metrics include:

  • 1.71x speedup on the 17M parameter model
  • 8.26x speedup on the 1B parameter model
Such efficiency improvements directly translate into reduced inference latency and lower operational costs for your reranking infrastructure.

What This Means For Your Operations

For your engineering and DevOps teams, the Ettin Reranker Family provides a direct path to implement state-of-the-art reranking capabilities. The fact that these are standard Sentence Transformers CrossEncoder models simplifies adoption, removing much of the friction often associated with integrating advanced deep learning models. You can leverage the provided training recipe and data to understand the methodology or even adapt it for domain-specific fine-tuning.

The reported speedups, specifically from the bf16+FA2 optimization, address a common bottleneck in deploying large language models: the computational expense of high-quality inference. This means you can achieve higher throughput on existing hardware, postponing or reducing the need for additional GPU investment for your reranking layer.

The Bottom Line for Developers

In conclusion, the Ettin Reranker Family offers a significant improvement in reranking capabilities, providing a straightforward integration path and notable performance gains. You can now supercharge your search results with these advanced models, achieving higher accuracy and lower latency. By leveraging the Ettin Reranker Family, you can improve the final relevance ranking in applications ranging from internal knowledge bases to customer-facing search interfaces.

Originally reported by

Hugging Face Blog

Share this article

What did you think?