Team Briefing

Huddle Mode

A real-time voice conversation with Mem. Like calling an assistant who already knows your world.

Our goal: ship a real-time voice conversation with Mem that feels like calling a human assistant who already knows our whole world — fluent recall in the moment, brain-dump capture without blocking, and heavier work executed during and after the call via the agent's full tool set.

How to read this: top sections are the briefing; lower sections add depth. The async execution model (the pillar we get into next) is the non-obvious crown jewel of this concept — we should make sure we've all internalized it before we start scoping.


TL;DR

Why this matters

Voice Mode today is well-received but constrained — we can only use it inside a specific note, and it can really only create or edit content within that note. What users actually want is to use their voice to interact with their entire Mem — not just a single note they're already sitting inside.

Huddle is the up-level: voice as the interaction surface for the whole product, not a dictation helper inside one file. Another way to say it: we already have great value propositions (Mem Agent's search + research + memory, Mem Chat's conversational intelligence, proactive recall via Heads Up); Huddle is the medium that exposes all of that through a new modality. Becoming multimodal is how we meet the user where they are — driving, walking between meetings, running errands, AirPods in.

The ICP skews toward busy operators whose work happens in motion. Voice is the interaction pattern that fits their life. A real-time conversation with an AI that already knows their world is dramatically higher-bandwidth than typed chat, and completely different from any generic voice assistant they've used.


The pillar that makes Huddle different: async execution

This is the single most important concept for us to get right. Most voice assistants feel robotic because the conversation is gated on work completion — you ask, it pauses to "think," then it responds. Huddle inverts that.

The model:

  1. During the huddle, the conversation is synchronous and flowing.
  2. When we ask for something non-trivial ("Consolidate all my Europe trip notes into one plan"), Mem reacts to the idea conversationally and queues the task silently in the background.
  3. The conversation continues — no awkward pause, no "processing…"
  4. The huddle engine keeps running after the call ends, executing queued work with proper thinking time.
  5. Results arrive asynchronously — via Mem Agent messaging us when it's done.

What this sounds like in a conversation:

You: Hey, I've got this idea for our onboarding flow — I think Teams admins should be able to invite via Slack DM, and we should stop assuming people know what Collections are in the welcome sequence.

Mem: Oh, that's a good one — the Slack DM thing especially, since that's where admins actually live. Does it connect at all to the pilot Kelly was pushing for?

(In the background: Mem queues up turning this dump into a structured note; it'll create it after the call ends. Nothing is said out loud about that — the conversation just keeps going.)

You: Yeah, good catch, it might. Anyway — also, can you consolidate my Europe trip notes into one plan?

Mem: Sure, I'll pull that together.

(Also queued.)

You: Great, I'll talk to you later.

(After the call, Mem executes both queued tasks. A few minutes later, a Mem Agent message: "Created 'Onboarding flow ideas' with your dump, and cross-referenced Kelly's pilot note." A few minutes after that: "Europe trip plan consolidated — here's the note.")

The conversation feels human because Mem reacts to the ideas, not to its own bookkeeping. The queued work is invisible until it lands. This pattern is what makes Huddle feel like an assistant rather than a slow model. Without this, Huddle is just Mem Chat with a microphone.


Core workflows (the hero paths)

1. Brain-dump capture without blocking

You call up Mem on a walk: "Hey, I've got this idea for a new onboarding flow — we should let Teams admins invite via Slack DM, and I think the welcome sequence needs to stop assuming people know what Collections are…"

Mem reacts to the idea: "Oh interesting — the Slack DM thing especially. Does that play at all into the pilot Kelly was pushing for?"

The conversation keeps going. Meanwhile, Mem has queued the dump as work. Back at your desk minutes later, there's a message from Mem: "Created 'Onboarding flow ideas' — cross-referenced Kelly's pilot note where relevant."

This is the workflow that does the most to define Huddle. The vibe should feel like talking to a sharp person who's listening and reacting, not to a transcription service.

2. Recall — "aware of your whole world"

"What's my KTN number?""362496."

"What were my follow-ups from the meeting with Kelly last week?""You said you'd send her the SOC 2 report."

"What should I focus on today?" → Grounded answer based on what you've been working on, your calendar, and pending commitments.

Mem already has context. We don't specify who Kelly is, which meeting, or what's recent. The bar for "real recall" is high — if Mem hedges or asks clarifying questions on prompts like these, the magic dies.

3. Queued multi-step work

You: "Can you go through all my notes on the Europe trip and consolidate them into one trip plan I can reference on the trip?"

Mem: "Yep — I'll pull that together. Anything else?"

You: "Also, remind me tomorrow morning to confirm the hotel."

Mem: "Got it."

Two distinct asks, both queued silently. The trip plan arrives as a Mem Agent message after the call; the reminder fires tomorrow.

4. Proactive contextual surfacing (dovetailing)

You: "Hey, remind me to schedule a meeting with Kelly for later this week."

Mem: "Sure, I'll remind you later today." (pause) "By the way — didn't you say you were going to send her the SOC 2 report? Want me to remind you about that one too?"

You: "Oh shoot, I forgot."

Mem: "No worries, I'll add it to the list."

You brought up Kelly; Mem dovetails on that topic and surfaces something related and useful. This is different from the time-based reminders below, which can be totally unrelated to what you're talking about — this one rides the current topic naturally.

5. Time-based reminders delivered when Mem has your attention

(Mid-conversation, natural pause.) "Oh — by the way, your rent check is due today."

When Mem has our attention anyway, it's a good moment to land a time-sensitive reminder. The huddle becomes a natural delivery channel for things that would otherwise fire as an OS notification we'd ignore.


Milestones

M1 — Pick the voice stack

Before we write any app code, we need a conversation engine. Evaluate and pick one of:

Pick based on latency, barge-in support, tool-calling fidelity, and which form factor each makes easiest (iOS vs. web vs. desktop). The voice stack choice likely forces our form-factor decision — if one has a clean iOS SDK, iOS first; otherwise web/desktop.

Latency budget: Mem starts responding within ~500ms of the user pausing. Anything slower kills the natural feel.

M2 — Capture + queued execution

M3 — Recall over the user's Mem

M4 — Proactive contextual surfacing

Bonus milestone — Mem calls you (stretch with high demo payoff)


More example vignettes

Use these to pressure-test whether scope feels right. If a demo doesn't naturally cover several, we probably need to adjust.


P2 / stretch ideas (if we land M1–M4)

iOS as a form factor (may actually be first)

iOS is the natural hero surface for the ICP — AirPods-in, on the move. If the voice stack we pick has a clean iOS SDK, we should start there. If it's mostly desktop/web, start there and treat iOS as a clear next-step. Either way, architecture for the voice session, recall, and work queue should be portable.

Continued agent execution and end-of-huddle artifacts

When the call ends, the huddle agent keeps running and emitting outputs — text messages summarizing what it did, draft emails delivered to the user's inbox, notes appearing in Mem, reminders scheduled for later. This isn't really "stretch" so much as a necessary property of M2, but worth calling out as something to polish: the post-hang-up experience should feel like a competent assistant doing the work, not like the call just ended.

Multi-tasking during huddle

On desktop: the huddle continues in a compact UI while the user uses Mem's main app. Worth exploring if we land M1–M3 with room to spare.


Key decisions to make on Day 1

  1. Voice stack. Pick one realtime voice API. Candidates include Gemini Live, OpenAI Realtime, xAI voice, Anthropic (if viable), AssemblyAI. Evaluate on latency, barge-in, tool-calling fidelity, and SDK quality per form factor. This decision will likely force #2.
  2. Form factor. iOS vs. web vs. desktop first, largely downstream of #1. Keep the architecture portable regardless — this is a medium we'll be in everywhere eventually.
  3. Invocation gesture. A button in the Mem app / Floating Mem bubble / a hotkey / a wake word. Recommendation: start with a button on whichever platform we choose; wake word can wait.
  4. Recall implementation. Reuse Mem's search-and-retrieval toolkit — the same one Mem Agent already uses. Don't build a new retrieval path for voice.
  5. Work queue representation. Not user-visible. Work is tracked inside the huddle agent; results are messaged to the user via Mem Agent after the call. Each queued task can spin up its own Mem Chat / Mem Agent session if that helps execution; we have a choice to make about which harness.
  6. Proactive-surface cadence. Start conservative (≤1 unsolicited surface per ~60s). Tune from there based on feel.

What "done" looks like for the week

Minimum demo (M1 + M2):

Strong demo (M1 + M2 + M3):

Wow demo (M1 + M2 + M3 + M4):

Top-tier demo (add Bonus milestone):


Things Huddle is not (for this hack week)