v1.59.8-stable

January 31, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

New Models / Updated Models

New OpenAI /image/variations endpoint BETA support Docs
Topaz API support on OpenAI /image/variations BETA endpoint Docs
Deepseek - r1 support w/ reasoning_content (Deepseek API, Vertex AI, Bedrock)
Azure - Add azure o1 pricing See Here
Anthropic - handle -latest tag in model for cost calculation
Gemini-2.0-flash-thinking - add model pricing (it’s 0.0) See Here
Bedrock - add stability sd3 model pricing See Here (s/o Marty Sullivan)
Bedrock - add us.amazon.nova-lite-v1:0 to model cost map See Here
TogetherAI - add new together_ai llama3.3 models See Here

LLM Translation

LM Studio -> fix async embedding call
Gpt 4o models - fix response_format translation
Bedrock nova - expand supported document types to include .md, .csv, etc. Start Here
Bedrock - docs on IAM role based access for bedrock - Start Here
Bedrock - cache IAM role credentials when used
Google AI Studio (gemini/) - support gemini 'frequency_penalty' and 'presence_penalty'
Azure O1 - fix model name check
WatsonX - ZenAPIKey support for WatsonX Docs
Ollama Chat - support json schema response format Start Here
Bedrock - return correct bedrock status code and error message if error during streaming
Anthropic - Supported nested json schema on anthropic calls
OpenAI - metadata param preview support
1. SDK - enable via litellm.enable_preview_features = True
2. PROXY - enable via litellm_settings::enable_preview_features: true
Replicate - retry completion response on status=processing

Spend Tracking Improvements

Bedrock - QA asserts all bedrock regional models have same supported_ as base model
Bedrock - fix bedrock converse cost tracking w/ region name specified
Spend Logs reliability fix - when user passed in request body is int instead of string
Ensure ‘base_model’ cost tracking works across all endpoints
Fixes for Image generation cost tracking
Anthropic - fix anthropic end user cost tracking
JWT / OIDC Auth - add end user id tracking from jwt auth

Management Endpoints / UI

allows team member to become admin post-add (ui + endpoints)
New edit/delete button for updating team membership on UI
If team admin - show all team keys
Model Hub - clarify cost of models is per 1m tokens
Invitation Links - fix invalid url generated
New - SpendLogs Table Viewer - allows proxy admin to view spend logs on UI
1. New spend logs - allow proxy admin to ‘opt in’ to logging request/response in spend logs table - enables easier abuse detection
2. Show country of origin in spend logs
3. Add pagination + filtering by key name/team name
/key/delete - allow team admin to delete team keys
Internal User ‘view’ - fix spend calculation when team selected
Model Analytics is now on Free
Usage page - shows days when spend = 0, and round spend on charts to 2 sig figs
Public Teams - allow admins to expose teams for new users to ‘join’ on UI - Start Here
Guardrails
1. set/edit guardrails on a virtual key
2. Allow setting guardrails on a team
3. Set guardrails on team create + edit page
Support temporary budget increases on /key/update - new temp_budget_increase and temp_budget_expiry fields - Start Here
Support writing new key alias to AWS Secret Manager - on key rotation Start Here

Helm

add securityContext and pull policy values to migration job (s/o https://github.com/Hexoplon)
allow specifying envVars on values.yaml
new helm lint test

Logging / Guardrail Integrations

Log the used prompt when prompt management used. Start Here
Support s3 logging with team alias prefixes - Start Here
Prometheus Start Here
1. fix litellm_llm_api_time_to_first_token_metric not populating for bedrock models
2. emit remaining team budget metric on regular basis (even when call isn’t made) - allows for more stable metrics on Grafana/etc.
3. add key and team level budget metrics
4. emit litellm_overhead_latency_metric
5. Emit litellm_team_budget_reset_at_metric and litellm_api_key_budget_remaining_hours_metric
Datadog - support logging spend tags to Datadog. Start Here
Langfuse - fix logging request tags, read from standard logging payload
GCS - don’t truncate payload on logging
New GCS Pub/Sub logging support Start Here
Add AIM Guardrails support Start Here

Security

New Enterprise SLA for patching security vulnerabilities. See Here
Hashicorp - support using vault namespace for TLS auth. Start Here
Azure - DefaultAzureCredential support

Health Checks

Cleanup pricing-only model names from wildcard route list - prevent bad health checks
Allow specifying a health check model for wildcard routes - https://docs.litellm.ai/docs/proxy/health#wildcard-routes
New ‘health_check_timeout ‘ param with default 1min upperbound to prevent bad model from health check to hang and cause pod restarts. Start Here
Datadog - add data dog service health check + expose new /health/services endpoint. Start Here

Performance / Reliability improvements

3x increase in RPS - moving to orjson for reading request body
LLM Routing speedup - using cached get model group info
SDK speedup - using cached get model info helper - reduces CPU work to get model info
Proxy speedup - only read request body 1 time per request
Infinite loop detection scripts added to codebase
Bedrock - pure async image transformation requests
Cooldowns - single deployment model group if 100% calls fail in high traffic - prevents an o1 outage from impacting other calls
Response Headers - return
1. x-litellm-timeout
2. x-litellm-attempted-retries
3. x-litellm-overhead-duration-ms
4. x-litellm-response-duration-ms
ensure duplicate callbacks are not added to proxy
Requirements.txt - bump certifi version

General Proxy Improvements

JWT / OIDC Auth - new enforce_rbac param,allows proxy admin to prevent any unmapped yet authenticated jwt tokens from calling proxy. Start Here
fix custom openapi schema generation for customized swagger’s
Request Headers - support reading x-litellm-timeout param from request headers. Enables model timeout control when using Vercel’s AI SDK + LiteLLM Proxy. Start Here
JWT / OIDC Auth - new role based permissions for model authentication. See Here

Complete Git Diff

This is the diff between v1.57.8-stable and v1.59.8-stable.

Use this to see the changes in the codebase.

Git Diff

v1.59.0

January 17, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

UI Improvements

[Opt In] Admin UI - view messages / responses

You can now view messages and response logs on Admin UI.

How to enable it - add store_prompts_in_spend_logs: true to your proxy_config.yaml

Once this flag is enabled, your messages and responses will be stored in the LiteLLM_Spend_Logs table.

general_settings:
  store_prompts_in_spend_logs: true

DB Schema Change

Added messages and responses to the LiteLLM_Spend_Logs table.

By default this is not logged. If you want messages and responses to be logged, you need to opt in with this setting

general_settings:
  store_prompts_in_spend_logs: true

v1.57.8-stable

January 11, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

alerting, prometheus, secret management, management endpoints, ui, prompt management, finetuning, batch

note

v1.57.8-stable, is currently being tested. It will be released on 2025-01-12.

New / Updated Models

Mistral large pricing - https://github.com/BerriAI/litellm/pull/7452
Cohere command-r7b-12-2024 pricing - https://github.com/BerriAI/litellm/pull/7553/files
Voyage - new models, prices and context window information - https://github.com/BerriAI/litellm/pull/7472
Anthropic - bump Bedrock claude-3-5-haiku max_output_tokens to 8192

General Proxy Improvements

Health check support for realtime models
Support calling Azure realtime routes via virtual keys
Support custom tokenizer on /utils/token_counter - useful when checking token count for self-hosted models
Request Prioritization - support on /v1/completion endpoint as well

LLM Translation Improvements

Deepgram STT support. Start Here
OpenAI Moderations - omni-moderation-latest support. Start Here
Azure O1 - fake streaming support. This ensures if a stream=true is passed, the response is streamed. Start Here
Anthropic - non-whitespace char stop sequence handling - PR
Azure OpenAI - support entrata id username + password based auth. Start Here
LM Studio - embedding route support. Start Here
WatsonX - ZenAPIKeyAuth support. Start Here

Prompt Management Improvements

Langfuse integration
HumanLoop integration
Support for using load balanced models
Support for loading optional params from prompt manager

Start Here

Finetuning + Batch APIs Improvements

Improved unified endpoint support for Vertex AI finetuning - PR
Add support for retrieving vertex api batch jobs - PR

NEW Alerting Integration

PagerDuty Alerting Integration.

Handles two types of alerts:

High LLM API Failure Rate. Configure X fails in Y seconds to trigger an alert.
High Number of Hanging LLM Requests. Configure X hangs in Y seconds to trigger an alert.

Start Here

Prometheus Improvements

Added support for tracking latency/spend/tokens based on custom metrics. Start Here

NEW Hashicorp Secret Manager Support

Support for reading credentials + writing LLM API keys. Start Here

Management Endpoints / UI Improvements

Create and view organizations + assign org admins on the Proxy UI
Support deleting keys by key_alias
Allow assigning teams to org on UI
Disable using ui session token for 'test key' pane
Show model used in 'test key' pane
Support markdown output in 'test key' pane

Helm Improvements

Prevent istio injection for db migrations cron job
allow using migrationJob.enabled variable within job

Logging Improvements

braintrust logging: respect project_id, add more metrics - https://github.com/BerriAI/litellm/pull/7613
Athina - support base url - ATHINA_BASE_URL
Lunary - Allow passing custom parent run id to LLM Calls

Git Diff

This is the diff between v1.56.3-stable and v1.57.8-stable.

Use this to see the changes in the codebase.

Git Diff

v1.57.7

January 10, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

langfuse, management endpoints, ui, prometheus, secret management

Langfuse Prompt Management

Langfuse Prompt Management is being labelled as BETA. This allows us to iterate quickly on the feedback we're receiving, and making the status clearer to users. We expect to make this feature to be stable by next month (February 2025).

Changes:

Include the client message in the LLM API Request. (Previously only the prompt template was sent, and the client message was ignored).
Log the prompt template in the logged request (e.g. to s3/langfuse).
Log the 'prompt_id' and 'prompt_variables' in the logged request (e.g. to s3/langfuse).

Start Here

Team/Organization Management + UI Improvements

Managing teams and organizations on the UI is now easier.

Changes:

Support for editing user role within team on UI.
Support updating team member role to admin via api - /team/member_update
Show team admins all keys for their team.
Add organizations with budgets
Assign teams to orgs on the UI
Auto-assign SSO users to teams

Start Here

Hashicorp Vault Support

We now support writing LiteLLM Virtual API keys to Hashicorp Vault.

Start Here

Custom Prometheus Metrics

Define custom prometheus metrics, and track usage/latency/no. of requests against them

This allows for more fine-grained tracking - e.g. on prompt template passed in request metadata

Start Here

v1.57.3 - New Base Docker Image

January 8, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

docker image, security, vulnerability

0 Critical/High Vulnerabilities

What changed?

LiteLLMBase image now uses cgr.dev/chainguard/python:latest-dev

Why the change?

To ensure there are 0 critical/high vulnerabilities on LiteLLM Docker Image

Migration Guide

If you use a custom dockerfile with litellm as a base image + apt-get

Instead of apt-get use apk, the base litellm image will no longer have apt-get installed.

You are only impacted if you use apt-get in your Dockerfile

# Use the provided base image
FROM ghcr.io/berriai/litellm:main-latest

# Set the working directory
WORKDIR /app

# Install dependencies - CHANGE THIS to `apk`
RUN apt-get update && apt-get install -y dumb-init 

Before Change

RUN apt-get update && apt-get install -y dumb-init

After Change

RUN apk update && apk add --no-cache dumb-init

v1.56.4

December 29, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

deepgram, fireworks ai, vision, admin ui, dependency upgrades

New Models

Deepgram Speech to Text

New Speech to Text support for Deepgram models. Start Here

from litellm import transcription
import os 

# set api keys 
os.environ["DEEPGRAM_API_KEY"] = ""
audio_file = open("/path/to/audio.mp3", "rb")

response = transcription(model="deepgram/nova-2", file=audio_file)

print(f"response: {response}")

Fireworks AI - Vision support for all models

LiteLLM supports document inlining for Fireworks AI models. This is useful for models that are not vision models, but still need to parse documents/images/etc. LiteLLM will add #transform=inline to the url of the image_url, if the model is not a vision model See Code

Proxy Admin UI

Test Key Tab displays model used in response

Test Key Tab renders content in .md, .py (any code/markdown format)

Dependency Upgrades

(Security fix) Upgrade to fastapi==0.115.5 https://github.com/BerriAI/litellm/pull/7447

Bug Fixes

Add health check support for realtime models Here
Health check error with audio_transcription model https://github.com/BerriAI/litellm/issues/5999

v1.56.3

December 28, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

guardrails, logging, virtual key management, new models

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

New Features

✨ Log Guardrail Traces

Track guardrail failure rate and if a guardrail is going rogue and failing requests. Start here

Traced Guardrail Success

Traced Guardrail Failure

`/guardrails/list`

/guardrails/list allows clients to view available guardrails + supported guardrail params

curl -X GET 'http://0.0.0.0:4000/guardrails/list'

Expected response

{
    "guardrails": [
        {
        "guardrail_name": "aporia-post-guard",
        "guardrail_info": {
            "params": [
            {
                "name": "toxicity_score",
                "type": "float",
                "description": "Score between 0-1 indicating content toxicity level"
            },
            {
                "name": "pii_detection",
                "type": "boolean"
            }
            ]
        }
        }
    ]
}

✨ Guardrails with Mock LLM

Send mock_response to test guardrails without making an LLM call. More info on mock_response here

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-npnwjPQciVRok5yNZgKmFQ" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "hi my email is ishaan@berri.ai"}
    ],
    "mock_response": "This is a mock response",
    "guardrails": ["aporia-pre-guard", "aporia-post-guard"]
  }'

Assign Keys to Users

You can now assign keys to users via Proxy UI

New Models

openrouter/openai/o1
vertex_ai/mistral-large@2411

Fixes

Fix vertex_ai/ mistral model pricing: https://github.com/BerriAI/litellm/pull/7345
Missing model_group field in logs for aspeech call types https://github.com/BerriAI/litellm/pull/7392

v1.56.1

December 27, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

key management, budgets/rate limits, logging, guardrails

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

✨ Budget / Rate Limit Tiers

Define tiers with rate limits. Assign them to keys.

Use this to control access and budgets across a lot of keys.

Start here

curl -L -X POST 'http://0.0.0.0:4000/budget/new' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
    "budget_id": "high-usage-tier",
    "model_max_budget": {
        "gpt-4o": {"rpm_limit": 1000000}
    }
}'

OTEL Bug Fix

LiteLLM was double logging litellm_request span. This is now fixed.

Relevant PR

Logging for Finetuning Endpoints

Logs for finetuning requests are now available on all logging providers (e.g. Datadog).

What's logged per request:

file_id
finetuning_job_id
any key/team metadata

Start Here:

Dynamic Params for Guardrails

You can now set custom parameters (like success threshold) for your guardrails in each request.

See guardrails spec for more details

v1.55.10

December 24, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

batches, guardrails, team management, custom auth

info

Get a free 7-day LiteLLM Enterprise trial here. Start here

No call needed

✨ Cost Tracking, Logging for Batches API (`/batches`)

Track cost, usage for Batch Creation Jobs. Start here

✨ `/guardrails/list` endpoint

Show available guardrails to users. Start here

✨ Allow teams to add models

This enables team admins to call their own finetuned models via litellm proxy. Start here

✨ Common checks for custom auth

Calling the internal common_checks function in custom auth is now enforced as an enterprise feature. This allows admins to use litellm's default budget/auth checks within their custom auth implementation. Start here

✨ Assigning team admins

Team admins is graduating from beta and moving to our enterprise tier. This allows proxy admins to allow others to manage keys/models for their own teams (useful for projects in production). Start here

v1.55.8-stable

December 22, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

A new LiteLLM Stable release just went out. Here are 5 updates since v1.52.2-stable.

langfuse, fallbacks, new models, azure_storage

Langfuse Prompt Management

This makes it easy to run experiments or change the specific models gpt-4o to gpt-4o-mini on Langfuse, instead of making changes in your applications. Start here

Control fallback prompts client-side

Claude prompts are different than OpenAI

Pass in prompts specific to model when doing fallbacks. Start here

New Providers / Models

NVIDIA Triton /infer endpoint. Start here
Infinity Rerank Models Start here

✨ Azure Data Lake Storage Support

Send LLM usage (spend, tokens) data to Azure Data Lake. This makes it easy to consume usage data on other services (eg. Databricks) Start here

Docker Run LiteLLM

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.55.8-stable

Get Daily Updates

LiteLLM ships new releases every day. Follow us on LinkedIn to get daily updates.

New Models / Updated Models​

LLM Translation​

Spend Tracking Improvements​

Management Endpoints / UI​

Helm​

Logging / Guardrail Integrations​

Security​

Health Checks​

Performance / Reliability improvements​

General Proxy Improvements​

Complete Git Diff​

UI Improvements​

[Opt In] Admin UI - view messages / responses​

DB Schema Change​

New / Updated Models​

General Proxy Improvements​

LLM Translation Improvements​

Prompt Management Improvements​

Finetuning + Batch APIs Improvements​

NEW Alerting Integration​

Prometheus Improvements​

NEW Hashicorp Secret Manager Support​

Management Endpoints / UI Improvements​

Helm Improvements​

Logging Improvements​

Git Diff​

Langfuse Prompt Management​

Team/Organization Management + UI Improvements​

Hashicorp Vault Support​

Custom Prometheus Metrics​

0 Critical/High Vulnerabilities

What changed?​

Why the change?​

Migration Guide​

New Models​

Deepgram Speech to Text​

Fireworks AI - Vision support for all models​

Proxy Admin UI​

Dependency Upgrades​

Bug Fixes​

New Features​

✨ Log Guardrail Traces​

Traced Guardrail Success​

Traced Guardrail Failure​

/guardrails/list​

✨ Guardrails with Mock LLM​

Assign Keys to Users​

New Models​

Fixes​

✨ Budget / Rate Limit Tiers​

OTEL Bug Fix​

Logging for Finetuning Endpoints​

Dynamic Params for Guardrails​

✨ Cost Tracking, Logging for Batches API (/batches)​

✨ /guardrails/list endpoint​

✨ Allow teams to add models​

✨ Common checks for custom auth​

✨ Assigning team admins​

Langfuse Prompt Management​

Control fallback prompts client-side​

New Providers / Models​

✨ Azure Data Lake Storage Support​

Docker Run LiteLLM​

Get Daily Updates​

New Models / Updated Models

LLM Translation

Spend Tracking Improvements

Management Endpoints / UI

Helm

Logging / Guardrail Integrations

Security

Health Checks

Performance / Reliability improvements

General Proxy Improvements

Complete Git Diff

UI Improvements

[Opt In] Admin UI - view messages / responses

DB Schema Change

New / Updated Models

General Proxy Improvements

LLM Translation Improvements

Prompt Management Improvements

Finetuning + Batch APIs Improvements

NEW Alerting Integration

Prometheus Improvements

NEW Hashicorp Secret Manager Support

Management Endpoints / UI Improvements

Helm Improvements

Logging Improvements

Git Diff

Langfuse Prompt Management

Team/Organization Management + UI Improvements

Hashicorp Vault Support

Custom Prometheus Metrics

What changed?

Why the change?

Migration Guide

New Models

Deepgram Speech to Text

Fireworks AI - Vision support for all models

Proxy Admin UI

Dependency Upgrades

Bug Fixes

New Features

✨ Log Guardrail Traces

Traced Guardrail Success

Traced Guardrail Failure

`/guardrails/list`

✨ Guardrails with Mock LLM

Assign Keys to Users

New Models

Fixes

✨ Budget / Rate Limit Tiers

OTEL Bug Fix

Logging for Finetuning Endpoints

Dynamic Params for Guardrails

✨ Cost Tracking, Logging for Batches API (`/batches`)

✨ `/guardrails/list` endpoint

✨ Allow teams to add models

✨ Common checks for custom auth

✨ Assigning team admins

Langfuse Prompt Management

Control fallback prompts client-side

New Providers / Models

✨ Azure Data Lake Storage Support

Docker Run LiteLLM

Get Daily Updates