Skip to main content

Thread Transfer

Cross-Session Context: Maintaining Continuity Across Conversations

Every new session starts from zero unless you build memory. Here's how to maintain context across sessions without exploding token costs.

Jorgo Bardho

Founder, Thread Transfer

July 14, 202514 min read
cross-sessioncontext continuitymemoryuser experience
Cross-session context flow diagram

"To be autonomous you have to carry context through a bunch of actions, but the models are very disconnected and don't have continuity the way we do." Microsoft's deputy CTO Sam Schillace identified the core challenge: the disconnected models problem. AI agents excel at single sessions but fail when context must persist across restarts, handoffs, or multiple interactions.

The Cross-Session Context Problem

Every AI interaction starts with a blank slate unless you engineer persistence. When a user returns hours or days later, the agent has no memory of prior conversations, decisions, or learned preferences. When one agent hands off to another, context evaporates. When a system restarts, state is lost.

This isn't just an inconvenience—it's a fundamental architectural challenge. As agents tackle longer horizons, "context management" can no longer mean "string manipulation." It must be treated as an architectural concern alongside storage and compute.

Why Cross-Session Context Matters

  • User experience: Repeating context wastes time and frustrates users
  • Agent autonomy: Long-running tasks require memory across interruptions
  • Multi-agent systems: Agents must share context to collaborate
  • Cost efficiency: Re-processing context on every session burns tokens
  • Decision continuity: Context loss leads to contradictory decisions

Approach 1: Session Compaction

The Claude Agent SDK has context management capabilities such as compaction, which enables an agent to work on a task without exhausting the context window. However, compaction alone isn't sufficient for long-running tasks.

How It Works

Periodically summarize conversation history to reduce token count while preserving key information.

async function compact_session(messages: Message[]) {
  if (messages.length < 20) return messages

  // Keep recent messages, summarize older ones
  const recent = messages.slice(-10)
  const old = messages.slice(0, -10)

  const summary = await llm.generate({
    system: "Summarize this conversation history, preserving key decisions, facts, and user preferences.",
    messages: old
  })

  return [
    { role: "system", content: `Previous context: ${summary}` },
    ...recent
  ]
}

Pros

  • Simple to implement
  • Reduces token usage by 60-80%
  • Works within single-session architecture

Cons

  • Lossy compression—details get dropped
  • Doesn't persist across restarts
  • Summarization quality varies
  • Still limited by context window

Approach 2: Initializer + Coding Agent Pattern

Anthropic developed a two-fold solution to enable the Claude Agent SDK to work effectively across many context windows: an initializer agent that sets up the environment on the first run, and a coding agent that is tasked with making incremental progress in every session, while leaving clear artifacts for the next session.

How It Works

  1. Initializer agent (session 1): Sets up project structure, creates TODO files, writes initial documentation
  2. Coding agent (sessions 2-N): Reads TODO, makes progress, updates artifacts for next session
  3. Artifacts: TODO.md, PROGRESS.md, STATE.json files that persist on disk
// Session 1: Initializer
await initializer.run({
  task: "Set up project for building authentication system",
  outputs: ["TODO.md", "ARCHITECTURE.md", "src/"]
})

// Session 2-N: Coding agent
while (!task_complete) {
  await coding_agent.run({
    instructions: "Read TODO.md, make progress, update TODO.md for next session",
    artifacts: ["TODO.md", "PROGRESS.md", "src/"]
  })
}

Benefits

  • Context persists in files, not memory
  • Each session can have fresh context window
  • Clear handoff mechanism between sessions
  • Works across restarts and days-long tasks

Limitations

  • Requires file system access
  • Agent must be trained to read/write artifacts
  • Not suitable for real-time conversational agents

Approach 3: Shared Memory Systems

"What we're almost talking about is just managing state over long periods of time," as one expert described the concept of persistent, shared memory for AI agents. This shared memory transcends the individual agent's internal state, acting as a centralized repository of information that all agents within a system can access and contribute to.

Architecture

class SharedMemory {
  async read(user_id: string, memory_type: string) {
    return await db.query({
      table: "agent_memory",
      where: { user_id, type: memory_type },
      order_by: "timestamp DESC",
      limit: 100
    })
  }

  async write(user_id: string, memory: Memory) {
    await db.insert({
      table: "agent_memory",
      data: {
        user_id,
        type: memory.type,
        content: memory.content,
        metadata: memory.metadata,
        timestamp: Date.now()
      }
    })
  }
}

// Usage in agent
const user_prefs = await memory.read(user.id, "preferences")
const past_decisions = await memory.read(user.id, "decisions")
const learned_facts = await memory.read(user.id, "facts")

// Inject into prompt
const context = `
  User preferences: ${JSON.stringify(user_prefs)}
  Past decisions: ${JSON.stringify(past_decisions)}
  Known facts: ${JSON.stringify(learned_facts)}
`

Memory Types

  • Preferences: User settings, communication style, tool preferences
  • Facts: Learned information about the user's domain, projects, team
  • Decisions: Past choices and their rationale
  • Tasks: Ongoing work, incomplete actions, reminders
  • Observations: Agent insights from past interactions

Benefits

  • Works across sessions, agents, and systems
  • Structured, queryable memory
  • Agents can enrich memory with observations
  • Scales to millions of users

Approach 4: Model Context Protocol (MCP)

The Model Context Protocol (MCP) offers promising solutions to context management challenges. MCP provides a standardized framework for connecting AI models with external data sources and tools, enabling more effective context retention and sharing across agent interactions.

How It Works

MCP servers expose context as resources that any MCP-compatible agent can retrieve:

// MCP server exposing conversation history
server.resource({
  name: "conversation_history",
  description: "User's conversation history across sessions",
  uri: "history://{user_id}/{session_id}",
  handler: async ({ user_id, session_id }) => {
    const history = await db.getConversationHistory(user_id, session_id)
    return {
      mimeType: "application/json",
      content: JSON.stringify(history)
    }
  }
})

// Any MCP client can now access it
const history = await mcp.readResource("history://user123/session456")

Benefits

  • Standardized cross-session context access
  • Works with any MCP-compatible agent or tool
  • Composable with other MCP resources (files, databases, APIs)
  • By early 2025, there was rapid adoption across the AI ecosystem

Approach 5: Context Offloading + Retrieval

Context Engineering includes Context Offloading (moving information to external systems) and Context Retrieval (adding information dynamically).

Architecture

  1. Offload: Store full conversation history in external database
  2. Index: Embed conversations for semantic search
  3. Retrieve: Fetch relevant past conversations on-demand
  4. Inject: Add to current session context
async function retrieve_relevant_context(query: string, user_id: string) {
  // Embed current query
  const query_embedding = await embed(query)

  // Search past conversations
  const past_context = await vector_db.query({
    vector: query_embedding,
    filter: { user_id },
    top_k: 3
  })

  return past_context.map(c => ({
    timestamp: c.metadata.timestamp,
    summary: c.metadata.summary,
    decisions: c.metadata.decisions
  }))
}

// In current session
const past = await retrieve_relevant_context(user_query, user.id)
const prompt = `
  Relevant past conversations:
  ${JSON.stringify(past, null, 2)}

  Current query: ${user_query}
`

Benefits

  • Scales to unlimited conversation history
  • Retrieves only relevant past context
  • Reduces token usage compared to loading full history
  • Works across sessions and agents

Approach 6: Context Isolation + Scoping

Context Isolation is the practice of separating context by scope to prevent pollution and ensure agents only access relevant information.

Implementation

class ContextManager {
  scopes = {
    global: [],      // Available to all agents
    user: {},        // Per-user context
    session: {},     // Per-session context
    agent: {}        // Per-agent context
  }

  async getContext(user_id: string, session_id: string, agent_id: string) {
    return {
      global: this.scopes.global,
      user: this.scopes.user[user_id] || [],
      session: this.scopes.session[session_id] || [],
      agent: this.scopes.agent[agent_id] || []
    }
  }

  async isolate(scope: string, context: any) {
    // Ensure context in this scope doesn't leak to others
    this.scopes[scope] = sanitize(context)
  }
}

Scoping Strategies

ScopeLifetimeExample Context
GlobalPermanentSystem instructions, API schemas, company policies
UserPermanentPreferences, learned facts, historical decisions
SessionHours to daysCurrent task state, recent messages, working memory
TurnSingle interactionRetrieved documents, function call results

Production Architecture: Multi-Tier Context

Production systems combine multiple approaches in a tiered architecture:

Tier 1: Hot Context (Always Loaded)

  • System instructions
  • User preferences
  • Current task state
  • Recent conversation (last 5-10 turns)
  • Target: Under 5K tokens

Tier 2: Warm Context (Cached, Frequently Accessed)

  • Conversation summaries from current session
  • Learned facts about user's domain
  • Recent decisions and their rationale
  • Frequently-accessed documents
  • Target: 20-50K tokens, cached for 5-60 minutes

Tier 3: Cold Context (Retrieved On-Demand)

  • Full conversation history (all sessions)
  • Past projects and outcomes
  • Rarely-accessed documents
  • Historical data and archives
  • Retrieved only when relevant to current query

Implementation

async function build_cross_session_context(
  user_id: string,
  session_id: string,
  current_query: string
) {
  // Tier 1: Hot (always loaded)
  const hot = {
    system: await getSystemInstructions(),
    user_prefs: await getPreferences(user_id),
    task_state: await getTaskState(session_id),
    recent: await getRecentMessages(session_id, 10)
  }

  // Tier 2: Warm (cached)
  const warm = await cache.get(`context_${session_id}`, async () => ({
    session_summary: await summarizeSession(session_id),
    learned_facts: await getLearnedFacts(user_id),
    recent_decisions: await getDecisions(user_id, 30)  // last 30 days
  }))

  // Tier 3: Cold (retrieved)
  const cold = await retrieveRelevantHistory(current_query, user_id)

  return { hot, warm, cold }
}

Observability: Monitoring Cross-Session Context

Track these metrics to ensure context persistence works:

MetricTargetWhat It Measures
Context recall rate>90%% of relevant past context successfully retrieved
Session continuity score>85%How well context flows across session boundaries
Memory utilization50-70%% of stored context actually used in responses
Contradiction rate<5%% of responses contradicting past statements
Re-ask rate<10%% of questions already answered in past sessions

Thread Transfer's Cross-Session Solution

Thread Transfer bundles are designed for cross-session context persistence. A bundle created from a Slack thread or Linear issue becomes a portable context unit that can be:

  • Stored in shared memory for access by any agent
  • Cached for fast warm-tier access
  • Retrieved semantically when relevant to new sessions
  • Injected deterministically to ensure consistency

Instead of re-processing 100-message Slack threads on every session, load the bundle once and get:

  • Key decisions and their rationale
  • Action items and owners
  • Relevant stakeholders
  • Timeline and milestones
  • Links to related threads

This delivers 40-80% token savings while enabling perfect context continuity across sessions, agents, and systems.

Enterprise Context Management

Fragmented RAG pipelines and application-specific context engineering can't keep up with enterprise agentic systems. Systematic, organization-wide context management capabilities will separate AI experiments from accelerated agentic AI adoption.

Key Requirements

  • Metadata designed for AI: AI agents need rich, interconnected context that enables them to read, write, and act safely on enterprise data
  • Access control: Context must respect user permissions and data isolation
  • Audit trails: Track what context was accessed and how it influenced decisions
  • Version control: Context evolves; agents need access to correct versions

Best Practices

  1. Design for persistence from day 1: Don't bolt on cross-session support later
  2. Use tiered architecture: Hot, warm, cold context with different persistence strategies
  3. Implement shared memory: Central repository accessible by all agents
  4. Cache aggressively: Reduce re-computation and token costs
  5. Scope context appropriately: Global, user, session, turn
  6. Monitor continuity metrics: Ensure context actually persists
  7. Enforce access control: Context leakage is a security risk
  8. Summarize periodically: Prevent context bloat

The Path Forward

The answer to cross-session context isn't just bigger windows—it's smarter context orchestration. Systems that treat context as infrastructure, with proper persistence, retrieval, and lifecycle management, will enable truly autonomous agents that can work on tasks spanning days, weeks, or months.

Start with shared memory for user preferences and learned facts. Add conversation summaries for warm-tier access. Implement semantic retrieval for cold-tier history. Layer in caching to reduce costs. Build observability to ensure it works.

The goal: agents that remember, learn, and maintain continuity across every interaction.