Thread Transfer

Cross-Session Context: Maintaining Continuity Across Conversations

Every new session starts from zero unless you build memory. Here's how to maintain context across sessions without exploding token costs.

Jorgo Bardho

Founder, Thread Transfer

July 14, 2025•14 min read

cross-sessioncontext continuitymemoryuser experience

"To be autonomous you have to carry context through a bunch of actions, but the models are very disconnected and don't have continuity the way we do." Microsoft's deputy CTO Sam Schillace identified the core challenge: the disconnected models problem. AI agents excel at single sessions but fail when context must persist across restarts, handoffs, or multiple interactions.

The Cross-Session Context Problem

Every AI interaction starts with a blank slate unless you engineer persistence. When a user returns hours or days later, the agent has no memory of prior conversations, decisions, or learned preferences. When one agent hands off to another, context evaporates. When a system restarts, state is lost.

This isn't just an inconvenience—it's a fundamental architectural challenge. As agents tackle longer horizons, "context management" can no longer mean "string manipulation." It must be treated as an architectural concern alongside storage and compute.

Why Cross-Session Context Matters

User experience: Repeating context wastes time and frustrates users
Agent autonomy: Long-running tasks require memory across interruptions
Multi-agent systems: Agents must share context to collaborate
Cost efficiency: Re-processing context on every session burns tokens
Decision continuity: Context loss leads to contradictory decisions

Approach 1: Session Compaction

The Claude Agent SDK has context management capabilities such as compaction, which enables an agent to work on a task without exhausting the context window. However, compaction alone isn't sufficient for long-running tasks.

How It Works

Periodically summarize conversation history to reduce token count while preserving key information.

async function compact_session(messages: Message[]) {
  if (messages.length &lt; 20) return messages

  // Keep recent messages, summarize older ones
  const recent = messages.slice(-10)
  const old = messages.slice(0, -10)

  const summary = await llm.generate({
    system: "Summarize this conversation history, preserving key decisions, facts, and user preferences.",
    messages: old
  })

  return [
    { role: "system", content: `Previous context: ${summary}` },
    ...recent
  ]
}

Pros

Simple to implement
Reduces token usage by 60-80%
Works within single-session architecture

Cons

Lossy compression—details get dropped
Doesn't persist across restarts
Summarization quality varies
Still limited by context window

Approach 2: Initializer + Coding Agent Pattern

Anthropic developed a two-fold solution to enable the Claude Agent SDK to work effectively across many context windows: an initializer agent that sets up the environment on the first run, and a coding agent that is tasked with making incremental progress in every session, while leaving clear artifacts for the next session.

How It Works

Initializer agent (session 1): Sets up project structure, creates TODO files, writes initial documentation
Coding agent (sessions 2-N): Reads TODO, makes progress, updates artifacts for next session
Artifacts: TODO.md, PROGRESS.md, STATE.json files that persist on disk

// Session 1: Initializer
await initializer.run({
  task: "Set up project for building authentication system",
  outputs: ["TODO.md", "ARCHITECTURE.md", "src/"]
})

// Session 2-N: Coding agent
while (!task_complete) {
  await coding_agent.run({
    instructions: "Read TODO.md, make progress, update TODO.md for next session",
    artifacts: ["TODO.md", "PROGRESS.md", "src/"]
  })
}

Benefits

Context persists in files, not memory
Each session can have fresh context window
Clear handoff mechanism between sessions
Works across restarts and days-long tasks

Limitations

Requires file system access
Agent must be trained to read/write artifacts
Not suitable for real-time conversational agents

Approach 3: Shared Memory Systems

"What we're almost talking about is just managing state over long periods of time," as one expert described the concept of persistent, shared memory for AI agents. This shared memory transcends the individual agent's internal state, acting as a centralized repository of information that all agents within a system can access and contribute to.

Architecture

class SharedMemory {
  async read(user_id: string, memory_type: string) {
    return await db.query({
      table: "agent_memory",
      where: { user_id, type: memory_type },
      order_by: "timestamp DESC",
      limit: 100
    })
  }

  async write(user_id: string, memory: Memory) {
    await db.insert({
      table: "agent_memory",
      data: {
        user_id,
        type: memory.type,
        content: memory.content,
        metadata: memory.metadata,
        timestamp: Date.now()
      }
    })
  }
}

// Usage in agent
const user_prefs = await memory.read(user.id, "preferences")
const past_decisions = await memory.read(user.id, "decisions")
const learned_facts = await memory.read(user.id, "facts")

// Inject into prompt
const context = `
  User preferences: ${JSON.stringify(user_prefs)}
  Past decisions: ${JSON.stringify(past_decisions)}
  Known facts: ${JSON.stringify(learned_facts)}
`

Memory Types

Preferences: User settings, communication style, tool preferences
Facts: Learned information about the user's domain, projects, team
Decisions: Past choices and their rationale
Tasks: Ongoing work, incomplete actions, reminders
Observations: Agent insights from past interactions

Benefits

Works across sessions, agents, and systems
Structured, queryable memory
Agents can enrich memory with observations
Scales to millions of users

Approach 4: Model Context Protocol (MCP)

The Model Context Protocol (MCP) offers promising solutions to context management challenges. MCP provides a standardized framework for connecting AI models with external data sources and tools, enabling more effective context retention and sharing across agent interactions.

How It Works

MCP servers expose context as resources that any MCP-compatible agent can retrieve:

// MCP server exposing conversation history
server.resource({
  name: "conversation_history",
  description: "User's conversation history across sessions",
  uri: "history://{user_id}/{session_id}",
  handler: async ({ user_id, session_id }) => {
    const history = await db.getConversationHistory(user_id, session_id)
    return {
      mimeType: "application/json",
      content: JSON.stringify(history)
    }
  }
})

// Any MCP client can now access it
const history = await mcp.readResource("history://user123/session456")

Benefits

Standardized cross-session context access
Works with any MCP-compatible agent or tool
Composable with other MCP resources (files, databases, APIs)
By early 2025, there was rapid adoption across the AI ecosystem

Approach 5: Context Offloading + Retrieval

Context Engineering includes Context Offloading (moving information to external systems) and Context Retrieval (adding information dynamically).

Architecture

Offload: Store full conversation history in external database
Index: Embed conversations for semantic search
Retrieve: Fetch relevant past conversations on-demand
Inject: Add to current session context

async function retrieve_relevant_context(query: string, user_id: string) {
  // Embed current query
  const query_embedding = await embed(query)

  // Search past conversations
  const past_context = await vector_db.query({
    vector: query_embedding,
    filter: { user_id },
    top_k: 3
  })

  return past_context.map(c => ({
    timestamp: c.metadata.timestamp,
    summary: c.metadata.summary,
    decisions: c.metadata.decisions
  }))
}

// In current session
const past = await retrieve_relevant_context(user_query, user.id)
const prompt = `
  Relevant past conversations:
  ${JSON.stringify(past, null, 2)}

  Current query: ${user_query}
`

Benefits

Scales to unlimited conversation history
Retrieves only relevant past context
Reduces token usage compared to loading full history
Works across sessions and agents

Approach 6: Context Isolation + Scoping

Context Isolation is the practice of separating context by scope to prevent pollution and ensure agents only access relevant information.

Implementation

class ContextManager {
  scopes = {
    global: [],      // Available to all agents
    user: {},        // Per-user context
    session: {},     // Per-session context
    agent: {}        // Per-agent context
  }

  async getContext(user_id: string, session_id: string, agent_id: string) {
    return {
      global: this.scopes.global,
      user: this.scopes.user[user_id] || [],
      session: this.scopes.session[session_id] || [],
      agent: this.scopes.agent[agent_id] || []
    }
  }

  async isolate(scope: string, context: any) {
    // Ensure context in this scope doesn't leak to others
    this.scopes[scope] = sanitize(context)
  }
}

Scoping Strategies

Scope	Lifetime	Example Context
Global	Permanent	System instructions, API schemas, company policies
User	Permanent	Preferences, learned facts, historical decisions
Session	Hours to days	Current task state, recent messages, working memory
Turn	Single interaction	Retrieved documents, function call results

Production Architecture: Multi-Tier Context

Production systems combine multiple approaches in a tiered architecture:

Tier 1: Hot Context (Always Loaded)

System instructions
User preferences
Current task state
Recent conversation (last 5-10 turns)
Target: Under 5K tokens

Tier 2: Warm Context (Cached, Frequently Accessed)

Conversation summaries from current session
Learned facts about user's domain
Recent decisions and their rationale
Frequently-accessed documents
Target: 20-50K tokens, cached for 5-60 minutes

Tier 3: Cold Context (Retrieved On-Demand)

Full conversation history (all sessions)
Past projects and outcomes
Rarely-accessed documents
Historical data and archives
Retrieved only when relevant to current query

Implementation

async function build_cross_session_context(
  user_id: string,
  session_id: string,
  current_query: string
) {
  // Tier 1: Hot (always loaded)
  const hot = {
    system: await getSystemInstructions(),
    user_prefs: await getPreferences(user_id),
    task_state: await getTaskState(session_id),
    recent: await getRecentMessages(session_id, 10)
  }

  // Tier 2: Warm (cached)
  const warm = await cache.get(`context_${session_id}`, async () => ({
    session_summary: await summarizeSession(session_id),
    learned_facts: await getLearnedFacts(user_id),
    recent_decisions: await getDecisions(user_id, 30)  // last 30 days
  }))

  // Tier 3: Cold (retrieved)
  const cold = await retrieveRelevantHistory(current_query, user_id)

  return { hot, warm, cold }
}

Observability: Monitoring Cross-Session Context

Track these metrics to ensure context persistence works:

Metric	Target	What It Measures
Context recall rate	>90%	% of relevant past context successfully retrieved
Session continuity score	>85%	How well context flows across session boundaries
Memory utilization	50-70%	% of stored context actually used in responses
Contradiction rate	<5%	% of responses contradicting past statements
Re-ask rate	<10%	% of questions already answered in past sessions

Thread Transfer's Cross-Session Solution

Thread Transfer bundles are designed for cross-session context persistence. A bundle created from a Slack thread or Linear issue becomes a portable context unit that can be:

Stored in shared memory for access by any agent
Cached for fast warm-tier access
Retrieved semantically when relevant to new sessions
Injected deterministically to ensure consistency

Instead of re-processing 100-message Slack threads on every session, load the bundle once and get:

Key decisions and their rationale
Action items and owners
Relevant stakeholders
Timeline and milestones
Links to related threads

This delivers 40-80% token savings while enabling perfect context continuity across sessions, agents, and systems.

Enterprise Context Management

Fragmented RAG pipelines and application-specific context engineering can't keep up with enterprise agentic systems. Systematic, organization-wide context management capabilities will separate AI experiments from accelerated agentic AI adoption.

Key Requirements

Metadata designed for AI: AI agents need rich, interconnected context that enables them to read, write, and act safely on enterprise data
Access control: Context must respect user permissions and data isolation
Audit trails: Track what context was accessed and how it influenced decisions
Version control: Context evolves; agents need access to correct versions

Best Practices

Design for persistence from day 1: Don't bolt on cross-session support later
Use tiered architecture: Hot, warm, cold context with different persistence strategies
Implement shared memory: Central repository accessible by all agents
Cache aggressively: Reduce re-computation and token costs
Scope context appropriately: Global, user, session, turn
Monitor continuity metrics: Ensure context actually persists
Enforce access control: Context leakage is a security risk
Summarize periodically: Prevent context bloat

The Path Forward

The answer to cross-session context isn't just bigger windows—it's smarter context orchestration. Systems that treat context as infrastructure, with proper persistence, retrieval, and lifecycle management, will enable truly autonomous agents that can work on tasks spanning days, weeks, or months.

Start with shared memory for user preferences and learned facts. Add conversation summaries for warm-tier access. Implement semantic retrieval for cold-tier history. Layer in caching to reduce costs. Build observability to ensure it works.

The goal: agents that remember, learn, and maintain continuity across every interaction.

Learn more: How it works · Why bundles beat raw thread history