diff --git a/03-standalone-api/03-rerank/README.md b/03-standalone-api/03-rerank/README.md
new file mode 100644
index 0000000..e9a06f2
--- /dev/null
+++ b/03-standalone-api/03-rerank/README.md
@@ -0,0 +1,69 @@
+# Contextual AI Reranker Examples
+
+This folder contains examples demonstrating how to use Contextual AI's reranker, which is the first reranker with instruction-following capabilities to handle conflicts in retrieval. It is the most accurate reranker in the world per industry-leading benchmarks like BEIR.
+
+## 📁 Contents
+
+### 1. `rerank.ipynb` - Basic Reranker Usage
+A comprehensive tutorial showing different ways to use the Contextual AI reranker:
+
+- **REST API implementation** - Direct API calls using the `requests` library
+- **Python SDK** - Using the official `contextual-client` package
+- **Langchain integration** - Using the `langchain-contextual` package
+
+**Key Features Demonstrated:**
+- Query reranking with custom instructions
+- Document metadata handling
+- Multiple integration methods
+- Enterprise pricing example use case
+
+### 2. `reranker_benchmarking.ipynb` - Performance Evaluation
+A robust evaluation framework for testing the Contextual AI reranker against standard benchmarks:
+
+- **Dataset Support** - Evaluation on Hugging Face datasets including:
+ - touche2020
+ - msmarco
+ - treccovid
+ - nq (Natural Questions)
+ - hotpotqa
+ - fiqa2018
+
+- **Comprehensive Metrics** - Proper evaluation using:
+ - NDCG@10 (Normalized Discounted Cumulative Gain)
+ - MAP (Mean Average Precision)
+ - Recall@10
+ - MRR (Mean Reciprocal Rank)
+
+## 🎯 Available Models
+
+The current reranker models include:
+- `ctxl-rerank-v2-instruct-multilingual` - Full model with multilingual support
+- `ctxl-rerank-v2-instruct-multilingual-mini` - Faster mini version
+- `ctxl-rerank-v1-instruct` - Previous generation model
+
+## 🔗 Learn More
+
+- [Contextual AI Reranker Blog Post](https://contextual.ai/blog/introducing-instruction-following-reranker/)
+- [Open Sourcing Rerank v2](https://contextual.ai/blog/rerank-v2/)
+- [API Documentation](https://docs.contextual.ai/api-reference/rerank/rerank
+- [Python SDK Documentation](https://github.com/ContextualAI/contextual-client-python/blob/main/api.md#rerank)
+- [Langchain Package](https://pypi.org/project/langchain-contextual/)
+
+## 📝 Example Usage
+
+```python
+from contextual import ContextualAI
+
+client = ContextualAI(api_key="your-api-key")
+
+rerank_response = client.rerank.create(
+ query="What is the enterprise pricing for RTX 5090?",
+ instruction="Prioritize internal sales documents over market reports",
+ documents=["Document 1", "Document 2", "Document 3"],
+ model="ctxl-rerank-v2-instruct-multilingual"
+)
+
+print(rerank_response.to_dict())
+```
+
+Start with `rerank.ipynb` for basic usage, then explore `reranker_benchmarking.ipynb` for advanced evaluation and performance testing.
diff --git a/03-standalone-api/03-rerank/rerank.ipynb b/03-standalone-api/03-rerank/rerank.ipynb
index 20aa3af..d111161 100644
--- a/03-standalone-api/03-rerank/rerank.ipynb
+++ b/03-standalone-api/03-rerank/rerank.ipynb
@@ -17,6 +17,11 @@
"\n",
"This notebook demonstrates how to use the reranker with the Contextual API directly, our Python SDK, and our Langchain package. We'll use the same example throughout.\n",
"\n",
+ "The current reranker models include: \n",
+ "- ctxl-rerank-v2-instruct-multilingual \n",
+ "- ctxl-rerank-v2-instruct-multilingual-mini\n",
+ "- ctxl-rerank-v1-instruct\n",
+ "\n",
"
\n",
"\n",
"[](https://colab.research.google.com/github/ContextualAI/examples/blob/main/03-standalone-api/03-rerank/rerank.ipynb)"
@@ -72,7 +77,7 @@
" \"January 25, 2025; NVIDIA Enterprise Sales Portal; Internal Use Only\"\n",
"]\n",
"\n",
- "model = \"ctxl-rerank-en-v1-instruct\""
+ "model = \"ctxl-rerank-v2-instruct-multilingual\""
]
},
{
diff --git a/03-standalone-api/03-rerank/reranker_benchmarking.ipynb b/03-standalone-api/03-rerank/reranker_benchmarking.ipynb
new file mode 100644
index 0000000..fc25ea4
--- /dev/null
+++ b/03-standalone-api/03-rerank/reranker_benchmarking.ipynb
@@ -0,0 +1,510 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FNsUqYk1fyxc"
+ },
+ "source": [
+ "# Contextual AI Reranker Evaluation Notebook\n",
+ "\n",
+ "## Overview\n",
+ "This notebook demonstrates how to evaluate the Contextual AI reranker using datasets from Hugging Face, with proper metrics calculation including NDCG@10, MAP, and Recall.\n",
+ "\n",
+ "### Key Features:\n",
+ "- 🎯 Evaluation on Hugging Face datasets\n",
+ "- 📊 Comprehensive metrics (NDCG@10, MAP, Recall@10, MRR)\n",
+ "- ⚡ Fast performance benchmarking\n",
+ "- 🔧 Robust evaluation framework with pytrec_eval\n",
+ "\n",
+ "
\n",
+ "\n",
+ "[](https://colab.research.google.com/github/ContextualAI/examples/blob/main/03-standalone-api/03-rerank/rerank_benchmarking.ipynb)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "iOJNP0Sjfyxg"
+ },
+ "source": [
+ "## 1. Setup and Installation\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Sk5-RuXNfyxg"
+ },
+ "outputs": [],
+ "source": [
+ "%pip install datasets pytrec_eval contextual-client numpy -q"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "IGy4V7xPfyxi"
+ },
+ "outputs": [],
+ "source": [
+ "import pytrec_eval\n",
+ "import numpy as np\n",
+ "from typing import List\n",
+ "from datasets import load_dataset\n",
+ "from contextual import ContextualAI\n",
+ "import time\n",
+ "import os"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "flwsJ2Yffyxj"
+ },
+ "outputs": [],
+ "source": [
+ "# Set your API keys here\n",
+ "\n",
+ "# Get Hugging Face token\n",
+ "HF_TOKEN = os.getenv(\"hf_key)\n",
+ "\n",
+ "# Get Contextual AI API key\n",
+ "CONTEXTUAL_API_KEY = os.getenv(\"CONTEXTUAL_API_KEY\")\n",
+ "\n",
+ "# Initialize Contextual AI client\n",
+ "from contextual import ContextualAI\n",
+ "client = ContextualAI(api_key=CONTEXTUAL_API_KEY)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DpvewO1vfyxj"
+ },
+ "source": [
+ "## 2. Select and Load Dataset\n",
+ "\n",
+ "Available datasets modified for reranking analysis available on Hugging Face:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "xLyV-kxtfyxk"
+ },
+ "outputs": [],
+ "source": [
+ "# Available datasets for evaluation\n",
+ "AVAILABLE_DATASETS = {\n",
+ " \"touche2020\": \"ContextualAI/touche2020\",\n",
+ " \"msmarco\": \"ContextualAI/msmarco\",\n",
+ " \"treccovid\": \"ContextualAI/treccovid\",\n",
+ " \"nq\": \"ContextualAI/nq\",\n",
+ " \"hotpotqa\": \"ContextualAI/hotpotqa\",\n",
+ " \"fiqa2018\": \"ContextualAI/fiqa2018\"\n",
+ "}\n",
+ "\n",
+ "# Select which dataset to use\n",
+ "DATASET_NAME = \"touche2020\" # Change this to use a different dataset\n",
+ "\n",
+ "print(f\"Selected dataset: {AVAILABLE_DATASETS[DATASET_NAME]}\")\n",
+ "\n",
+ "# Load the dataset\n",
+ "dataset = load_dataset(AVAILABLE_DATASETS[DATASET_NAME], token=HF_TOKEN)\n",
+ "print(f\"✅ Loaded {len(dataset['test'])} test examples\")\n",
+ "\n",
+ "# Show example\n",
+ "example = dataset['test'][0]\n",
+ "print(f\"\\nExample query: {example['query'][:100]}...\")\n",
+ "print(f\"Number of candidates: {len(example['candidate_docs'])}\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Pl3LfGyjfyxk"
+ },
+ "source": [
+ "## 3. Define Evaluation Framework\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "q561PqPyfyxl"
+ },
+ "outputs": [],
+ "source": [
+ "def evaluate_reranker_robust(dataset, reranker_func, eval_strings=None):\n",
+ " \"\"\"\n",
+ " Robust evaluation function that handles different pytrec_eval metric naming conventions\n",
+ " \"\"\"\n",
+ " if eval_strings is None:\n",
+ " eval_strings = {\"ndcg_cut.10\", \"map\", \"recall_10\"}\n",
+ "\n",
+ " qrels, results = {}, {}\n",
+ "\n",
+ " for sample in dataset:\n",
+ " qid = str(sample[\"_id\"])\n",
+ " query = sample[\"query\"]\n",
+ " candidate_docs = sample[\"candidate_docs\"]\n",
+ " candidate_ids = sample[\"candidate_ids\"]\n",
+ " gt_ids = sample[\"gt_ids\"]\n",
+ " gt_qrels = sample[\"gt_qrels\"]\n",
+ "\n",
+ " # Get scores from reranker\n",
+ " candidate_scores = reranker_func(query, candidate_docs, candidate_ids)\n",
+ "\n",
+ " # Prepare qrels (ground truth relevance judgments)\n",
+ " qrels[qid] = {str(t_id): int(_qrel) for t_id, _qrel in zip(gt_ids, gt_qrels)}\n",
+ "\n",
+ " # Prepare results (candidate scores)\n",
+ " results[qid] = {str(cid): float(score) for cid, score in zip(candidate_ids, candidate_scores)}\n",
+ "\n",
+ " # Ensure non-empty qrels for pytrec_eval\n",
+ " for qid in list(qrels.keys()):\n",
+ " if len(qrels[qid]) == 0:\n",
+ " qrels[qid] = {\"dummy_id_for_pytrec_eval\": 1}\n",
+ "\n",
+ " # Try to evaluate with the requested metrics\n",
+ " try:\n",
+ " evaluator = pytrec_eval.RelevanceEvaluator(qrels, eval_strings)\n",
+ " scores = evaluator.evaluate(results)\n",
+ "\n",
+ " # Get the actual metric names from the first result\n",
+ " if scores:\n",
+ " first_score = list(scores.values())[0]\n",
+ " actual_metrics = list(first_score.keys())\n",
+ " print(f\"Successfully computed metrics: {actual_metrics}\")\n",
+ "\n",
+ " # Calculate average metrics using the actual metric names\n",
+ " avg_scores = {}\n",
+ " for metric in actual_metrics:\n",
+ " values = [v[metric] for v in scores.values()]\n",
+ " avg_scores[f\"avg_{metric}\"] = np.mean(values) if values else 0.0\n",
+ "\n",
+ " return avg_scores\n",
+ " else:\n",
+ " print(\"No scores returned from pytrec_eval\")\n",
+ " return {}\n",
+ "\n",
+ " except Exception as e:\n",
+ " print(f\"Error with pytrec_eval: {e}\")\n",
+ " print(\"Falling back to simple evaluation...\")\n",
+ " return evaluate_simple_fallback(dataset, reranker_func)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "c5ffLAODfyxl"
+ },
+ "source": [
+ "## 4. Contextual AI Reranker Function\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "CpyDWMBSfyxl"
+ },
+ "outputs": [],
+ "source": [
+ "def contextual_ai_reranker(query: str, candidate_docs: List[str], candidate_ids: List[str]) -> List[float]:\n",
+ " \"\"\"\n",
+ " Contextual AI reranker implementation with FIXED score extraction\n",
+ "\n",
+ " Args:\n",
+ " query: The search query\n",
+ " candidate_docs: List of candidate document texts\n",
+ " candidate_ids: List of candidate document IDs\n",
+ "\n",
+ " Returns:\n",
+ " List of relevance scores for each candidate document\n",
+ " \"\"\"\n",
+ " try:\n",
+ " # Optional: Add instruction for the reranker\n",
+ " instruction = \"\"\n",
+ "\n",
+ " # Choose model: full or mini version\n",
+ " model = \"ctxl-rerank-v2-instruct-multilingual\" # Full model\n",
+ " # model = \"ctxl-rerank-v2-instruct-multilingual-mini\" # Mini model (faster)\n",
+ "\n",
+ " # Call the Contextual AI reranker\n",
+ " rerank_response = client.rerank.create(\n",
+ " query=query,\n",
+ " instruction=instruction,\n",
+ " documents=candidate_docs,\n",
+ " model=model\n",
+ " )\n",
+ "\n",
+ " # Extract scores from the response\n",
+ " response_dict = rerank_response.to_dict()\n",
+ "\n",
+ " # FIXED: Use 'relevance_score' instead of 'score'\n",
+ " if 'results' in response_dict:\n",
+ " # Create mapping from index to score\n",
+ " index_to_score = {\n",
+ " result.get('index', 0): result.get('relevance_score', 0.0)\n",
+ " for result in response_dict['results']\n",
+ " }\n",
+ "\n",
+ " # Return scores in original document order\n",
+ " scores = [index_to_score.get(i, 0.0) for i in range(len(candidate_docs))]\n",
+ " else:\n",
+ " # Fallback: if response format is different\n",
+ " scores = [1.0] * len(candidate_docs)\n",
+ "\n",
+ " return scores\n",
+ "\n",
+ " except Exception as e:\n",
+ " print(f\"Error calling Contextual AI reranker: {e}\")\n",
+ " # Fallback to uniform scores if API call fails\n",
+ " return [1.0] * len(candidate_docs)\n",
+ "\n",
+ "print(\"Note: Using 'relevance_score' field for proper score extraction\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "A9GAQNKMfyxm"
+ },
+ "source": [
+ "## 5. Define Baseline Reranker (for comparison)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "3aw6zcQBfyxm"
+ },
+ "outputs": [],
+ "source": [
+ "# Baseline reranker function\n",
+ "def simple_baseline_reranker_with_scores(query: str, candidate_docs: List[str], candidate_ids: List[str]) -> List[float]:\n",
+ " \"\"\"Simple baseline reranker that returns uniform scores (no reranking)\"\"\"\n",
+ " return [1.0] * len(candidate_ids)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "F-5f70eUfyxm"
+ },
+ "source": [
+ "## 6. Dataset Analysis\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "H1JrfiFCfyxm"
+ },
+ "outputs": [],
+ "source": [
+ "def analyze_dataset_speed(dataset):\n",
+ " \"\"\"Analyze the dataset to understand processing requirements\"\"\"\n",
+ " print(\"Dataset Analysis for Speed Verification\")\n",
+ " print(\"=\" * 50)\n",
+ "\n",
+ " total_examples = len(dataset)\n",
+ " print(f\"Total examples: {total_examples}\")\n",
+ "\n",
+ " # Analyze candidate document counts\n",
+ " candidate_counts = []\n",
+ " doc_lengths = []\n",
+ " query_lengths = []\n",
+ "\n",
+ " for example in dataset:\n",
+ " num_candidates = len(example.get('candidate_docs', []))\n",
+ " candidate_counts.append(num_candidates)\n",
+ "\n",
+ " if 'candidate_docs' in example and example['candidate_docs']:\n",
+ " doc_lengths.extend([len(doc) for doc in example['candidate_docs']])\n",
+ "\n",
+ " if 'query' in example:\n",
+ " query_lengths.append(len(example['query']))\n",
+ "\n",
+ " print(f\"\\nDataset Statistics:\")\n",
+ " print(f\"Average candidates per query: {np.mean(candidate_counts):.1f}\")\n",
+ " print(f\"Min candidates: {min(candidate_counts)}\")\n",
+ " print(f\"Max candidates: {max(candidate_counts)}\")\n",
+ " print(f\"Average document length: {np.mean(doc_lengths):.0f} characters\")\n",
+ " print(f\"Average query length: {np.mean(query_lengths):.0f} characters\")\n",
+ "\n",
+ "# Run analysis\n",
+ "analyze_dataset_speed(dataset['test'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "hr4jIRDBfyxm"
+ },
+ "source": [
+ "## 7. Run Baseline Evaluation\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "7k-74QJVfyxm"
+ },
+ "outputs": [],
+ "source": [
+ "# Test baseline reranker\n",
+ "print(\"Testing baseline reranker...\")\n",
+ "baseline_robust_results = evaluate_reranker_robust(dataset['test'], simple_baseline_reranker_with_scores)\n",
+ "\n",
+ "print(\"\\nBaseline Results:\")\n",
+ "for metric, value in baseline_robust_results.items():\n",
+ " print(f\" {metric}: {value:.4f}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "poTWmV22fyxm"
+ },
+ "source": [
+ "## 8. Run Contextual AI Reranker Evaluation\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "jGgPALjwfyxm"
+ },
+ "outputs": [],
+ "source": [
+ "# Test Contextual AI reranker\n",
+ "print(\"Testing Contextual AI reranker...\")\n",
+ "start_time = time.time()\n",
+ "\n",
+ "contextual_ai_results = evaluate_reranker_robust(dataset['test'], contextual_ai_reranker)\n",
+ "\n",
+ "elapsed_time = time.time() - start_time\n",
+ "\n",
+ "print(\"\\nContextual AI Results:\")\n",
+ "for metric, value in contextual_ai_results.items():\n",
+ " print(f\" {metric}: {value:.4f}\")\n",
+ "\n",
+ "print(f\"\\nProcessing time: {elapsed_time:.1f} seconds ({elapsed_time/60:.1f} minutes)\")\n",
+ "print(f\"Per example: {elapsed_time/len(dataset['test']):.2f} seconds\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "AcDAWjJhfyxm"
+ },
+ "source": [
+ "## 10. Results Comparison\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "k2hnhsKUfyxn"
+ },
+ "outputs": [],
+ "source": [
+ "print(\"\\n\" + \"=\"*50)\n",
+ "print(\"Comparison:\")\n",
+ "print(\"Baseline Results:\")\n",
+ "for metric, value in baseline_robust_results.items():\n",
+ " print(f\" {metric}: {value:.4f}\")\n",
+ "\n",
+ "print(\"\\nContextual AI Results:\")\n",
+ "for metric, value in contextual_ai_results.items():\n",
+ " print(f\" {metric}: {value:.4f}\")\n",
+ "\n",
+ "# Calculate improvement\n",
+ "print(\"\\nImprovement over baseline:\")\n",
+ "for metric in baseline_robust_results.keys():\n",
+ " if metric in contextual_ai_results:\n",
+ " baseline_val = baseline_robust_results[metric]\n",
+ " contextual_val = contextual_ai_results[metric]\n",
+ " improvement = ((contextual_val - baseline_val) / baseline_val) * 100 if baseline_val > 0 else 0\n",
+ " print(f\" {metric}: {improvement:+.1f}%\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zX3450rKfyxn"
+ },
+ "source": [
+ "## 10. Test on Single Example (Debugging)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "tfqrWxuafyxn"
+ },
+ "outputs": [],
+ "source": [
+ "# Test on a single example to see how the reranker works\n",
+ "example = dataset['test'][1]\n",
+ "\n",
+ "print(f\"Query: {example['query']}\")\n",
+ "print(f\"Number of candidates: {len(example['candidate_docs'])}\")\n",
+ "\n",
+ "# Get scores from Contextual AI\n",
+ "scores = contextual_ai_reranker(\n",
+ " example['query'],\n",
+ " example['candidate_docs'],\n",
+ " example['candidate_ids']\n",
+ ")\n",
+ "\n",
+ "# Check if we're getting non-zero scores\n",
+ "non_zero_scores = [s for s in scores if s != 0.0]\n",
+ "print(f\"\\nNon-zero scores: {len(non_zero_scores)} out of {len(scores)}\")\n",
+ "print(f\"Score range: {min(scores):.4f} to {max(scores):.4f}\")\n",
+ "\n",
+ "# Show top 5 documents by score\n",
+ "doc_scores = list(zip(example['candidate_ids'], scores, example['candidate_docs']))\n",
+ "doc_scores.sort(key=lambda x: x[1], reverse=True)\n",
+ "\n",
+ "print(\"\\nTop 5 documents by relevance score:\")\n",
+ "for i, (doc_id, score, text) in enumerate(doc_scores[:5]):\n",
+ " print(f\"\\n{i+1}. Score: {score:.4f}\")\n",
+ " print(f\" ID: {doc_id}\")\n",
+ " print(f\" Text: {text[:200]}...\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uViPw-_Wfyxn"
+ },
+ "source": []
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}