Foundation Models Prompting Guide


Greetings, traveler!

On-device Foundation Models reward careful prompt design. The model runs with a smaller context window and less headroom for long instructions. When a prompt is vague, overloaded, or padded with filler text, output quality drops fast.

Why on-device prompting needs a different playbook

Many prompting patterns were popularized around large server models that can absorb long context and handle heavier reasoning. On-device models have tighter limits, so prompt quality becomes part of correctness. If you want accurate answers with fewer hallucinations, your prompts need to stay concise and explicit.

What you should plan for during development

  • Write simple, unambiguous instructions
  • Iterate based on test output
  • Provide a dedicated place for reasoning when you need it
  • Reduce the amount of reasoning required
  • Break complex work into smaller requests
  • Keep conditional behavior lightweight in the prompt, move branching into code when possible
  • Use example-based prompting (zero-shot, one-shot, few-shot) when format and behavior need anchoring
  • Test prompts across real-world inputs and judge results like any other user-facing feature

Prompts are structured input, not casual text

Prompt engineering in Apple’s framing is about shaping requests: phrasing, context, and formatting. A model follows patterns. Your job is to provide a pattern that stays readable and stable under variation.

A simple conditional pattern that stays on-device friendly

If the input is a question, answer it. If it reads like a statement, ask a clarifying question.

let policy = """
Decide what the person wrote.
If it is a question, answer it in one paragraph.
If it is a statement, ask one follow-up question.
"""
let session = LanguageModelSession(instructions: policy)

Keep prompts simple and clear

On-device models work best with short prompts that a human can understand at a glance. The model processes tokens, and the context window is limited. Long prompts waste tokens on text that does not help the task.

Guidelines that improve reliability

  • Use one well-defined goal per prompt
  • Lead with imperative verbs (List, Generate, Rewrite, Classify)
  • Assign a role when it helps the model stay in the right register
  • Prefer short, active sentences
  • Keep prompts to one to three paragraphs when possible
  • Avoid jargon and ambiguous phrasing
  • Avoid hedging and social padding

Two prompts that ask for the same thing

The first version keeps the task visible. The second version hides the task inside explanation.

let conciseTask = """
From the person’s recent browsing and purchases in home decor, produce five interest tags.
Order tags by relevance.
Add two extra tags from the same domain that are not mentioned in the input.
"""

let bloatedTask = """
The person will provide recent purchases and browsing related to home decor.
Your output should contain a list of categories that reflect what they may like.
Order categories so the most relevant appear early.
Also include additional categories that create new ideas beyond the input.
"""

When prompts get longer, the model spends tokens interpreting setup. The task sentence becomes harder to spot. Clarity usually improves results and reduces token use.

Give the model a role, persona, and tone

The system model tends to respond in a neutral, business-casual voice. You can guide its behavior by describing who it is and who it is speaking to.

Role and persona

A role describes the job. A persona describes how that job is performed.

let mentorStyle = """
You are a senior iOS engineer who enjoys mentoring.
Explain the concept using practical guidance and short examples.
Avoid academic language.
"""
let session = LanguageModelSession(instructions: mentorStyle)

Using “expert” to raise specificity

The “expert” can push the model toward more confident, detailed output.

let expertReviewer = """
You are an expert code reviewer.
Review the person’s sentence for clarity and precision.
Return three concrete improvements.
"""

Defining the user

The model assumes it is speaking to an average adult. When your app serves a specific audience, tell it.

let beginnerAudience = """
The person is new to programming.
Explain the idea with simple words and define any technical term you use.
"""

Setting tone by example

Tone often follows the voice of the instructions.

let calmProfessional = """
Write in a calm, professional style.
Keep sentences short.
Avoid humor and slang.
"""

Iterate to improve instruction following

Instruction following is the model’s ability to execute what you wrote in Instructions and the user’s Prompt. On-device prompting improves through iteration and testing.

Use these three adjustment levers

  • Improve clarity by rewriting the instruction
  • Add emphasis with a small number of constraint words (must, do not, avoid)
  • Repeat the most important constraints once at the end

Here is an example of emphasis and a repeated constraint without turning the prompt into a wall of rules:

let rules = """
Summarize the text in three bullet points.
Each bullet must be a single sentence.
Do not include extra commentary.

Repeat: output exactly three bullet points.
"""
let session = LanguageModelSession(instructions: rules)

If a prompt remains unreliable after a few iterations, reduce complexity. A compact baseline prompt can be a useful test:

let baseline = "Answer the person’s question."
let session = LanguageModelSession(instructions: baseline)

If the baseline is unstable, the task may not fit the model’s capabilities or your input distribution.

Reduce how much thinking the model needs to do

On-device models have limited reasoning capacity compared to larger models. When a task requires planning, provide a plan. The goal is to convert a vague objective into a small procedure.

A step-by-step plan inside a single request

let stepPlan = """
Given the person’s activity related to running shoes:

1. Pick four product categories that match the activity.
2. Add two related categories that are not explicitly mentioned.
3. Return a single list ordered by relevance.
"""
let session = LanguageModelSession(instructions: stepPlan)

Splitting complex work into multiple sessions

A single request often gives better latency. When output quality is inconsistent, splitting steps across sessions can improve stability by resetting the context window.

let extractFacts = """
Extract key facts from the person’s message.
Return them as short bullet points.
"""
let generateResponse = """
Using the extracted facts, draft a helpful reply.
Keep it under 120 words.
"""

let extractionSession = LanguageModelSession(instructions: extractFacts)
let responseSession = LanguageModelSession(instructions: generateResponse)

This pattern also makes debugging easier because you can see where drift starts.

Turn conditional prompting into programming logic

Conditionals inside a prompt are tempting. As the rule list grows, on-device models can lose track and apply irrelevant conditions. Apple’s suggestion is to move branching into Swift and inject only the relevant instruction.

A prompt with multiple in-text conditions

let tangled = """
You are a friendly shopkeeper. Greet the visitor.
IF the visitor is a wizard, mention their staff.
IF the visitor is a musician, ask about tonight’s performance.
IF the visitor is a guard, ask about safety in the city.
There is one small room and one large room available.
"""

A runtime-customized prompt built from code

enum VisitorKind { case wizard, musician, guard, unknown }

func visitorNote(for kind: VisitorKind) -> String {
    switch kind {
    case .wizard:
        return "The visitor is a wizard. Comment briefly on their magical gear."
    case .musician:
        return "The visitor is a musician. Ask if they plan to play here tonight."
    case .guard:
        return "The visitor is a guard. Ask if there have been recent incidents."
    case .unknown:
        return "The visitor looks tired from travel."
    }
}

let base = """
You are a friendly shopkeeper.
Write a greeting for a visitor who just arrived.
"""

let note = visitorNote(for: kind)

let instructions = """
\(base)
\(note)
Mention that one small room and one large room are available.
"""

let session = LanguageModelSession(instructions: instructions)

This keeps irrelevant condition branches out of the context window.

Provide simple input-output examples

When you need stable format, small examples help. Apple recommends few-shot prompting with simple examples, and suggests a range of 2–15. Keep each example short. Overly detailed examples can lead to repetition and made-up details.

A few-shot prompt with a lightweight structure

let examples = """
Create a fictional café customer for a cozy game.
Return an object with fields: name, look, order.

Examples:
{name: "Juniper", look: "a sleepy fox in a knit scarf", order: "hot cocoa with cinnamon"}
{name: "Mara", look: "a tiny robot with a cracked screen", order: "iced latte, extra ice"}
{name: "Orin", look: "a painter with ink-stained fingers", order: "espresso, no sugar"}
"""
let session = LanguageModelSession(instructions: examples)

Use guided generation when structure matters

Guided generation helps you constrain output shape using @Generable. Few-shot examples can reinforce the intended fields and style, and guided generation enforces the schema.

@Generable
struct CafeGuest: Equatable {
    let name: String
    let order: String
    let look: String
}

Then generate:

let instructions = """
Create a new café guest for a story game.
Keep each field short.
"""
let session = LanguageModelSession(instructions: instructions)

let prompt = "A customer who looks like they came from a rainy street."
let guest = try await session.respond(to: prompt, generating: CafeGuest.self)

Handle reasoning safely with a dedicated field

Reasoning prompts can leak “working” text into structured output when the model has nowhere to put it. It is recommended to add a reasoning field as the first property, then constrain the final answer field.

Here is the same pattern with different naming and a simpler target type:

@Generable
struct Answer {
    var workLog: String

    @Guide(description: "The answer only.")
    var result: String
}

Prompt it explicitly:

let instructions = """
Answer the person’s question.
1. Write a short plan in workLog.
2. Follow the plan and show intermediate steps in workLog.
3. Put the final answer in result.
"""
let session = LanguageModelSession(instructions: instructions)

let prompt = "How many days are there in November?"
let response = try await session.respond(to: prompt, generating: Answer.self)

This does not guarantee perfect reasoning. It gives the model a safe place to put it, which protects the shape of your output.

Checklist

  • Simple, clear instructions
  • Iteration based on test output
  • A reasoning field before answering when needed
  • Reducing reasoning burden with step plans
  • Splitting complex prompts into smaller requests, including multiple sessions
  • Conditional behavior, plus the recommendation to move branching into Swift
  • Shot-based prompting with small examples, plus guidance on example count and simplicity