How to Think About Performance in iOS → Livsy Code

Greetings, traveler!

Performance work in mobile apps often starts with a familiar question: why does this screen feel slow. The answer rarely sits in one place. It might be in the network, in the way the UI updates, in how data flows through the app, or in how memory is used under the hood. Looking at a single function or a single optimization rarely explains the full picture.

In practice, performance is easier to reason about as a system. A tap on the screen triggers a chain of events: a request is sent, data is processed, views are updated, and the result is rendered. Each step adds its own cost. Improving one part can shift the bottleneck somewhere else. Without a clear model, it is easy to spend time optimizing code that has no visible impact on the user.

Another complication comes from perception. Users do not measure frame times or CPU usage. They react to how quickly the interface responds and how predictable it feels. A screen that renders partial content early can feel faster than one that waits and shows everything at once, even if the total work is the same. That difference shapes how performance decisions should be made.

This article approaches performance from a layered perspective. Instead of listing isolated tips, it breaks the problem into several levels: product perception, metrics, architecture, UI rendering, networking, data handling, runtime behavior, and low-level execution. Each level introduces its own constraints and trade-offs. Understanding how they connect makes it easier to decide where to focus and when an optimization is worth the effort.

The goal is to build a mental model that holds across different parts of an iOS application. With that model in place, performance work becomes less about guessing and more about making deliberate decisions based on how the system actually behaves.

Product-level performance

Performance work often starts in the wrong place. The first instinct is to look at code, measure execution time, and search for bottlenecks. That approach misses a more basic question: does the user actually experience a problem.

From a product perspective, performance is tied to how the interface responds to user actions. A delay becomes noticeable when there is no feedback, when the screen appears empty, or when interaction feels inconsistent. The same amount of work can feel fast or slow depending on how it is presented.

What users actually perceive

Users do not see CPU usage, memory allocations, or network traces. They notice when something appears on the screen, when they can start interacting, and whether the experience remains smooth.

Three aspects tend to shape that perception.

The first is time to first feedback. After a tap, even a partial response changes how the delay is perceived. A placeholder, a skeleton, or a quick transition signals that the system has started working.

The second is continuity. Gaps in rendering or pauses between updates draw attention immediately. A screen that fills in gradually can feel more responsive than one that waits and renders everything at once.

The third is predictability. Consistent timing is easier to accept than occasional spikes. An interface that behaves the same way each time feels more reliable, even if it is not the fastest possible.

Techniques that change perception

Several common techniques work because they address perception directly rather than reducing total execution time.

Progressive rendering breaks the work into stages. The layout appears first, followed by content. Users begin scanning the screen earlier, which shortens the perceived wait.

Skeleton views serve a similar role. They show the structure before the data arrives and reduce uncertainty about what is coming next. The total time may remain unchanged, but the waiting period feels shorter.

Deliberate delays can also play a role in specific scenarios. Instant responses are not always interpreted as correct or trustworthy. In flows where users expect some form of processing, a small delay can make the result feel more credible. This is context-dependent and needs to be used carefully, but it shows that raw speed is not the only variable.

Optimistic updates

Another way to improve perceived performance is to avoid waiting for the backend when the result can be safely predicted locally.

For some actions, the app already knows the next UI state. A favorite button, a bookmark, a local preference, or a draft can be updated immediately. The backend request still happens, but it moves out of the critical path of user feedback.

func toggleFavorite(id: Item.ID) {
    let previousValue = store.item(id).isFavorite
    let newValue = !previousValue

    store.updateItem(id) { item in
        item.isFavorite = newValue
    }

    Task {
        do {
            try await api.setFavorite(id: id, isFavorite: newValue)
        } catch {
            store.updateItem(id) { item in
                item.isFavorite = previousValue
            }

            showSyncError()
        }
    }
}

func toggleFavorite(id: Item.ID) {
    let previousValue = store.item(id).isFavorite
    let newValue = !previousValue

    store.updateItem(id) { item in
        item.isFavorite = newValue
    }

    Task {
        do {
            try await api.setFavorite(id: id, isFavorite: newValue)
        } catch {
            store.updateItem(id) { item in
                item.isFavorite = previousValue
            }

            showSyncError()
        }
    }
}

This changes the perceived latency of the interaction. Instead of waiting for a network round trip before the UI changes, the user receives immediate feedback and the system reconciles with the backend afterwards.

The trade-off is consistency. Optimistic updates require a rollback strategy, conflict handling, and clear UI states for pending or failed synchronization. They are useful when the failure rate is low and the local result is easy to reverse. They are risky when the action has financial, legal, or irreversible consequences.

In those cases, the safer approach is to keep the operation explicit: show progress, prevent duplicate submissions, and wait for server confirmation before presenting the final state.

Testing under constrained performance

A useful practical check is to test render-heavy screens with Low Power Mode enabled.

Low Power Mode does not simulate an older device. It does not change memory bandwidth, device architecture, refresh rate, or OS behavior. However, it does put the current device into a more constrained performance state. iOS may reduce CPU and GPU performance, which makes it easier to expose frame drops on screens with expensive layout, drawing, compositing, or media rendering.

This is especially useful when the team does not have access to a wide device lab. It should not replace testing on older hardware, but it can serve as a cheap stress test during development.

The goal is not to prove that the screen performs well on all devices. The goal is to reveal whether the screen has little performance headroom. If a feed, animation, or media-heavy screen starts dropping frames under Low Power Mode, that is a good signal to inspect layout cost, view invalidation, image decoding, transparency, and compositing.

Launch time

Launch time deserves special attention because it affects the entire user base before users receive any value from the app. One extra second may look small during local testing, but across thousands or millions of launches it becomes a large amount of accumulated waiting time.

In mature applications, startup often becomes an accumulation point for historical decisions. Crash reporting, analytics, logging, dependency injection, database setup, remote configuration, feature flags, SDK initialization, and feature services gradually move into the early launch path. Each item may look reasonable in isolation, but together they delay the first meaningful screen.

The first step is to separate startup work by criticality. Only work required to decide and render the initial route should block launch: minimal environment configuration, auth state, required feature flags, and the minimal service graph for the first screen. Everything else should be deferred, lazy-initialized, or started after the first screen appears.

Parallelism can help, but only when startup tasks are truly independent. In larger apps, explicit dependencies are safer than relying on manual ordering. A launch system should make it clear which tasks depend on which prerequisites, which tasks are critical, and which can run later.

Pre-main time also matters. Before application code runs, the system loads the executable and dynamic libraries, performs fixups, prepares runtime metadata, and runs initializers. A large number of dynamic frameworks or expensive global initialization can increase cold launch time. Static linking may help for modules that are always needed at startup, while dynamic frameworks still make sense when code can be loaded later or provides a real modular boundary.

Launch optimization should be measured separately for cold and warm starts. Useful breakdowns include pre-main time, application initialization, first screen creation, first meaningful render, and deferred startup completion. Without this separation, it is easy to optimize the wrong part of the launch path.

At the production level, the goal is not only to make startup faster once, but to protect the launch path over time. Every new SDK, service, framework, or startup task should justify why it belongs before the first meaningful screen.

Launch time should be protected with automated regression checks. A UI performance test can measure startup on a release-like build, compare the result against a baseline, and notify the team when the threshold is exceeded. This does not replace Instruments or production data, but it prevents launch time from slowly degrading as new startup work is added across many pull requests.

Launch performance tests are often better suited for a scheduled pipeline than for every pull request. Startup measurements are sensitive to device state, CI load, caching, and background activity, so a nightly run can collect multiple samples in a more stable environment without slowing down the normal development feedback loop.

Deciding what to optimize

Not every delay needs to be reduced. Performance work becomes effective when it focuses on parts of the system that change user behavior.

A useful approach is to treat optimization as a hypothesis. Instead of asking how to make something faster, start by asking whether it matters.

One way to answer that question is to introduce controlled delays. Adding 100 or 200 milliseconds to a specific step can reveal whether users notice the difference. If engagement and conversion remain unchanged, that part of the system is unlikely to be a priority. If metrics shift, the impact is real and worth investigating.

This also highlights a common pattern. Improvements have diminishing returns. Reducing a delay from one and a half seconds to one second changes how the interface feels. Reducing it from 300 milliseconds to 200 milliseconds often goes unnoticed. Past a certain point, further optimization increases complexity without affecting the outcome.

Performance and user behavior

Changes in performance influence how users interact with the product. That effect can be measured.

When the main screen appears earlier, users start scrolling sooner. More content becomes visible, and interaction begins earlier in the session. When scrolling remains smooth, users tend to explore more items. When delays disappear from critical flows, fewer sessions are abandoned.

In one case, reducing the time to render the main screen changed how users moved through the feed. Earlier access to content increased engagement and led to more downstream actions. The code changes were local, but the effect showed up in product metrics.

This connection is easy to overlook. Performance work may look like a technical improvement, yet it often alters user behavior in ways that matter for conversion, retention, and revenue.

Trade-offs and side effects

Optimizations can introduce secondary effects. Faster execution in one part of the system can create pressure elsewhere.

Rendering content earlier may expose incomplete data if supporting requests have not finished. Faster interactions can increase load on backend services because users perform more actions in less time. Changes that reduce waiting can also affect how features such as ads or recommendations are delivered.

There are also cases where speed reduces clarity. When results appear immediately in flows that imply processing, users may question their validity. A small delay can make the system feel more deliberate.

These effects do not invalidate optimization efforts. They show that performance changes the system as a whole, not just the duration of individual operations.

Closing thought

At this level, performance work is about deciding where improvement matters. The goal is to identify delays that affect perception and behavior, then focus on those parts of the system.

Once that decision is made, it becomes easier to move down the stack and analyze how the work is executed.

Metrics and observability

Once a performance issue is visible at the product level, the next step is to make it measurable. Without that, every discussion turns into guesswork. Two engineers can look at the same screen and come to different conclusions, both based on limited evidence.

Metrics turn perception into something that can be inspected, compared, and tracked over time. They answer a simple question: where exactly does the delay come from.

From perception to measurement

A complaint like “this screen feels slow” contains very little actionable information. It does not tell whether the delay happens before the first content appears, during data loading, during view updates, or while rendering. It also does not tell whether the issue affects all users or only a subset.

This is where observability begins. Instead of treating performance as a single number, the system needs to capture how time is spent across the entire interaction.

User-centric metrics

Not all measurements are equally useful. Metrics become meaningful when they follow the path of the user.

Time to first content reflects when the screen starts to show useful information. It often determines the first impression of responsiveness.

Time to interactive reflects when the user can actually do something with the interface. A screen that appears quickly but remains unresponsive still feels slow.

Scroll smoothness captures continuity. Frame drops and hitches are easy to notice, especially in content-heavy screens.

End-to-end latency connects these points into a single flow: from a tap to the moment when the result is rendered and ready.

These metrics describe the experience rather than the implementation. They do not explain why something is slow, but they show where to look.

Breaking down latency

A single duration rarely points to a solution. Knowing that a screen opens in 800 milliseconds does not reveal which part of the system consumes that time.

Breaking the flow into stages changes that. A typical path includes input handling, network requests, data processing, state updates, view updates, and rendering. Each stage can be measured separately.

In practice, this decomposition often reveals uneven distribution. One stage may dominate the timeline while others remain relatively small. In some cases, multiple smaller delays accumulate and create the perception of slowness.

This level of detail is what turns a vague problem into a concrete investigation.

Local and production measurements

Local profiling provides depth. Instruments can show CPU usage, allocations, call stacks, and precise timings. This makes it possible to understand how the code behaves under controlled conditions.

Production data provides scale. Real users run the application on different devices, under varying network conditions, and with different usage patterns. This environment cannot be reproduced locally.

These two sources serve different purposes. Local tools help identify the cause. Production metrics show how widespread the issue is and whether it affects critical scenarios.

Tooling in context

Tools matter, but only in relation to the questions they answer. Time Profiler shows where CPU time is spent. It answers “what is expensive”.

SwiftUI-specific tools add another dimension. They show when SwiftUI performs updates, which updates are long, and how often they occur. This helps answer a different question: “is the UI system itself the bottleneck”.

Even more importantly, modern tooling can explain why updates happen. Instead of only looking at call stacks, you can inspect cause-and-effect relationships: which state change triggered which view update, and how that change propagated through the system.

This distinction matters. In imperative UI frameworks, a backtrace often explains why a view updated. In declarative systems, updates are driven by dependencies. Understanding those dependencies is often more valuable than inspecting call stacks.

Production observability relies on aggregated metrics. Systems like MetricKit or internal dashboards track latency distributions, percentiles, and trends over time.

Experiments connect performance changes to outcomes. A feature can be rolled out to a subset of users, and its impact can be measured before a wider release.

Common pitfalls

Several patterns tend to appear when metrics are introduced. A single number often becomes the focus. Average latency looks stable while tail latency grows. Users who experience the slowest cases are then overlooked.

Measurement points may not align with what users see. A metric can be recorded before the content is actually visible, which leads to misleading conclusions.

Another common issue is focusing only on expensive operations. In UI systems, performance problems are often caused not by a single slow operation, but by too many updates happening within the same frame.

Production monitoring tools

Local profiling and production monitoring should complement each other. Local tools help explain why something happens. Production tools show how often it happens, on which devices, in which app versions, and under which real-world conditions.

Instruments is the right tool when you need depth. It helps inspect CPU usage, memory allocations, hangs, hitches, rendering, and power behavior under controlled conditions. It is especially useful when you can reproduce a scenario locally and need to understand the cause.

MetricKit works differently. It passively collects performance and diagnostic data from real usage and delivers aggregated reports to the app. This makes it useful for detecting regressions across versions, devices, and operating system releases, but it is not a real-time debugging tool. Metric reports are delivered at most once per day, so MetricKit is better suited for trend analysis than immediate investigation.

Third-party tools such as Firebase Performance or New Relic Mobile provide hosted dashboards, traces, version comparisons, and network monitoring. They reduce the amount of infrastructure a team needs to build, but they come with trade-offs in flexibility, cost, data ownership, and integration depth.

Larger teams often add custom performance analytics on top of existing product analytics. This gives full control over what is measured and how it is connected to business flows, but it also means maintaining instrumentation, sampling, dashboards, privacy rules, and alerting.

The important point is that production performance should not be judged from debug builds or casual testing. Debug builds, attached debuggers, excessive logging, and non-representative test data can distort results. They are useful for finding obvious issues, but decisions about user-facing performance should be based on release-like builds and production data.

Closing thought

At this level, the goal is to locate the problem with enough precision to act on it. Once the critical path is visible and measured, the focus can shift to why the system performs this work and how to reduce it.

Architecture and data flow

Metrics show where time is spent. Architecture explains why that time exists in the first place. In many cases, the issue is not a slow operation, but the amount of work triggered by a single change.

A user action should result in the minimal amount of work required to produce the next state. When the system performs more work than necessary, latency grows even if each individual step is efficient.

Work as the primary cost

Each state mutation can trigger a chain of updates:

state changes
dependent components recompute
data is transformed
UI is updated

If this chain is too wide, even small changes become expensive. The cost comes from the number of affected elements, not from the complexity of a single computation.

Data flow and dependencies

The structure of dependencies determines how updates propagate. In SwiftUI, a common issue is a shared model that feeds multiple parts of the screen:

final class ScreenViewModel: ObservableObject {
    @Published var title: String = ""
    @Published var items: [Item] = []
}

final class ScreenViewModel: ObservableObject {
    @Published var title: String = ""
    @Published var items: [Item] = []
}

struct ScreenView: View {
    @StateObject var viewModel = ScreenViewModel()

    var body: some View {
        VStack {
            Text(viewModel.title)
            List(viewModel.items) { item in
                ItemView(item: item)
            }
        }
    }
}

struct ScreenView: View {
    @StateObject var viewModel = ScreenViewModel()

    var body: some View {
        VStack {
            Text(viewModel.title)
            List(viewModel.items) { item in
                ItemView(item: item)
            }
        }
    }
}

A change in one property can cause unrelated parts of the UI to update. The problem is not the framework, but the breadth of dependencies.

Declarative frameworks make this behavior more visible. Views declare what data they depend on, and the system updates them when that data changes. If the dependency is broad, updates propagate further than intended.

A typical pattern is indirect dependency on shared state. For example, if multiple views read from the same collection or shared object, any mutation of that collection may invalidate all of them, even if only one element actually changed.

This leads to a situation where a single user action causes many view updates. Each update may be cheap, but the total work becomes significant.

Reducing dependency breadth

The key lever is granularity. When views depend only on the specific data they need, updates become localized. Instead of propagating across the entire screen, changes affect only the relevant parts.

This often requires restructuring state:

splitting large observable objects
introducing more localized state holders
avoiding indirect reads of shared collections
ensuring each view depends on the smallest possible piece of data

The goal is not to eliminate shared state, but to control how changes propagate through the system.

Observation vs ObservableObject

ObservableObject provides coarse-grained updates. Any change emits a signal, even if the view depends on only one property.

More fine-grained observation reduces unnecessary updates by tracking which properties are actually accessed. This narrows propagation, but it does not replace the need for careful data modeling.

Even with fine-grained observation, reading a broad structure still creates a broad dependency.

Environment as a dependency surface

The environment introduces another layer of dependencies. Reading from the environment is convenient, but it creates a dependency on a shared structure. When environment values change, views may need to re-evaluate to determine whether the specific value they use has changed.

Even when a view does not ultimately update, there is still a cost associated with checking those dependencies. This becomes noticeable when many views read frequently changing values.

For this reason, values that change often, such as geometry or time-based data, should be introduced into the environment carefully.

UIKit

The same issue appears in imperative UI frameworks.

tableView.reloadData()

tableView.reloadData()

This call invalidates the entire table. All visible cells are recreated and reconfigured, even if only a single item has changed.

A more precise update limits the scope:

tableView.reloadRows(at: [indexPath], with: .automatic)

tableView.reloadRows(at: [indexPath], with: .automatic)

Updating the entire table introduces unnecessary work, while targeted updates keep the cost proportional to the change.

Critical path

Not all work contributes to perceived performance. The critical path includes everything required before the first meaningful frame appears.

Work performed during UI updates is part of that path. In SwiftUI, this includes reevaluating view bodies. Even small amounts of work can become significant when multiplied across many views.

Removing work from this path often provides larger gains than optimizing individual functions.

Closing thought

At the architectural level, performance work focuses on reducing how much work is triggered and how widely it propagates. Once this is under control, the remaining cost shifts to how efficiently that work is executed.

In SwiftUI, performance largely comes down to two constraints:
view bodies must execute quickly, and they must not run more often than necessary.

UI and rendering pipeline

Once data is prepared and scoped correctly, the system still needs to produce frames. This stage has its own cost, independent of architecture. Performance here depends on how efficiently the UI pipeline turns state into pixels.

Rendering pipeline

At a high level, each frame goes through a sequence:

handle input
update state
recompute UI
layout
drawing
compositing

All of this work must complete within a frame budget. If it does not, the system cannot produce a new frame in time, and the previous frame remains on screen longer than expected. This is perceived as a hitch.

SwiftUI and the frame budget

In SwiftUI, view updates are part of this pipeline. When state changes, SwiftUI schedules updates that will be processed before the next frame. During this phase, it reevaluates the body of views whose dependencies have changed.

This work runs on the main thread and competes directly with responsiveness. Two patterns tend to cause problems:

view bodies that take too long to run
too many view updates within a single frame

A single slow update can exceed the frame budget. A large number of small updates can have the same effect when combined.

Body execution cost

The body of a view is expected to be lightweight. It describes UI, but it is still executable code. Work performed inside body is part of the render-time path. This includes:

formatting values
creating helper objects
performing transformations
deriving display-ready data

Even if each operation is small, repeating it across many views increases total cost. A common pattern is computing derived values directly inside body. This ties the computation to every update cycle.

A more efficient approach is to move this work earlier:

precompute derived values when data changes
cache results when possible
reuse expensive objects instead of recreating them
limit body to assembling already-prepared data

This shifts work away from the critical path and reduces per-frame cost.

Update frequency

Cost is not only about how expensive each update is, but how often updates occur. When dependencies are broad, a single state change can trigger updates across many views. Even if each update is fast, the combined work may exceed the frame budget.

Reducing unnecessary updates often has a larger impact than optimizing individual ones.

Layout, drawing, compositing

After view updates, the system proceeds with layout, drawing, and compositing.

Layout determines size and position. Its cost grows with hierarchy depth and dependency complexity.

Drawing converts view descriptions into rendering commands. Repeated elements amplify cost.

Compositing combines layers into the final image. Transparency and overlapping layers increase work due to blending.

These stages are affected by how many views are active and how often they change.

Scrolling as a stress case

Scrolling combines all of these factors:

frequent state changes
continuous view updates
repeated layout and drawing

This makes it a natural stress test for performance. Issues that are not visible in static screens often appear during scrolling.

Visibility and laziness

Lazy containers limit view creation to the visible portion of the screen. This reduces:

layout work
drawing cost
memory usage

However, it does not reduce the cost of updates for views that are already on screen. If those views update frequently or perform expensive work, the frame budget can still be exceeded.

Layout strategy in heavy collection screens

Most screens do not require manual layout. Auto Layout and SwiftUI layout are expressive, maintainable, and fast enough for the majority of product interfaces.

The trade-off becomes visible in performance-sensitive collection screens, especially when cells are complex, repeated many times, and updated frequently. This is common in server-driven UI, media-heavy feeds, and screens where a single item combines text, images, badges, actions, and dynamic visibility rules.

In these scenarios, layout computation itself can become part of the frame budget. Auto Layout introduces the cost of constraint resolution. SwiftUI introduces the cost of view reevaluation, dependency tracking, and layout passes. None of these costs are inherently problematic, but they can accumulate when many elements are measured and positioned during scrolling.

Frame-based layout can reduce this overhead by making geometry explicit. This approach shifts layout from implicit resolution to explicit calculation. It is less flexible and usually more verbose, but in hot paths it can keep layout cost predictable and proportional to the number of visible elements.

The decision should be measurement-driven. For most screens, Auto Layout or SwiftUI is the better engineering choice. For a small number of heavy collection screens, manual layout can be a useful performance boundary.

Closing thought

At this level, performance work focuses on ensuring that UI updates fit within the frame budget. This requires both keeping individual updates lightweight and limiting how many updates occur per frame.

Network and backend interaction

Even with an efficient UI pipeline, the interface cannot render meaningful content until data arrives. The time between initiating a request and receiving a response defines the starting point for the entire flow.

At this level, performance depends on how requests are structured, how many steps are required, and how quickly the backend responds.

Latency as a sequence of steps

Network latency is not a single value. It is the result of several sequential stages:

connection setup
secure handshake
request transmission
server processing
response download

Each stage contributes to the total delay. Understanding this breakdown helps identify where time is actually spent.

Round trips

The number of round trips between client and server has a direct impact on latency. A common pattern that increases latency is chaining requests:

let user = try await fetchUser()
let feed = try await fetchFeed(user.id)

let user = try await fetchUser()
let feed = try await fetchFeed(user.id)

The second request cannot start until the first completes, so total latency becomes cumulative.

Reducing the number of dependent requests often provides larger gains than optimizing individual calls.

Request orchestration

The structure of requests determines how latency accumulates. Independent requests can be executed concurrently:

async let user = fetchUser()
async let feed = fetchFeed()
let (u, f) = try await (user, feed)

async let user = fetchUser()
async let feed = fetchFeed()
let (u, f) = try await (user, feed)

This overlaps waiting time and reduces total latency to the slowest request rather than the sum of all requests.

The key constraint is dependency: only independent operations can be parallelized safely.

Payload size and processing

The size of the response affects both transfer time and client-side work.

Large payloads:

take longer to download
require more decoding and transformation

In many cases, connection setup dominates smaller requests, while payload size becomes more important for large responses or batch operations.

Backend processing

Part of the delay originates on the server.

This includes:

data aggregation
database queries
business logic

From the UI perspective, backend latency and network latency appear identical. Improving client performance alone cannot compensate for slow server processing.

Network variability

Network conditions vary across devices and environments. Latency, bandwidth, and reliability are not constant. A flow that performs well in ideal conditions may degrade under real-world usage.

Designing request patterns with variability in mind improves consistency.

Common inefficiencies

Typical patterns that increase latency:

sequential requests with unnecessary dependencies
multiple small requests instead of aggregated responses
blocking UI on non-critical data

Each of these increases the time required before rendering can begin.

Request concurrency is not free

Parallel requests reduce waiting time when operations are independent and latency dominates the flow. They do not multiply available bandwidth.

If the network is constrained, several simultaneous downloads compete for the same limited resource. A screen that starts five image downloads, an analytics request, recommendations, and the main content request at the same time may delay the one request that actually unlocks the first meaningful render.

This is why request orchestration should consider priority, not only dependency. Critical data should receive the earliest opportunity to complete. Secondary data can often be deferred, loaded with lower priority, or requested when it becomes visible.

On a fast connection, moderate concurrency may improve total latency. On a poor connection, reducing concurrency can make the experience more predictable because the app stops competing with itself.

The important distinction is whether the flow is latency-bound or bandwidth-bound. Small independent requests often benefit from concurrency. Large payloads, images, and repeated prefetching can saturate the connection and slow down the content the user is waiting for.

Closing thought

At this level, performance work focuses on reducing the time required to obtain data and structuring requests so that waiting periods overlap wherever possible.

Data and caching

Network latency defines how long it takes to obtain data. Caching determines whether that work needs to happen again.

At this level, performance work focuses on reuse. The goal is to avoid performing the same operation multiple times.

Sources of repeated work

Repeated work appears in several forms:

fetching the same data multiple times
parsing and transforming identical responses
rebuilding derived models
recomputing UI-ready data

Each repetition adds latency and increases system load.

Basic caching model

Caching stores the result of an operation and reuses it when needed.

func loadFeed() async throws -> [Item] {
    let data = try await fetchFeed()
    return parse(data)
}

func loadFeed() async throws -> [Item] {
    let data = try await fetchFeed()
    return parse(data)
}

Repeated calls trigger both network and parsing work.

With caching:

func loadFeed() async throws -> [Item] {
    if let cached = cache.feed {
        return cached
    }

    let data = try await fetchFeed()
    let parsed = parse(data)
    cache.feed = parsed
    return parsed
}

func loadFeed() async throws -> [Item] {
    if let cached = cache.feed {
        return cached
    }

    let data = try await fetchFeed()
    let parsed = parse(data)
    cache.feed = parsed
    return parsed
}

Subsequent calls avoid both network and computation.

Reuse over recomputation

Caching is one form of reuse, but the principle is broader.

Any deterministic transformation can be reused:

parsed models
formatted values
filtered collections

Avoiding recomputation reduces both latency and CPU usage.

Prefetching

Prefetching shifts work earlier in time. Instead of waiting for a request, the system anticipates future needs:

func loadNextPageIfNeeded(currentIndex: Int) {
    if currentIndex > items.count - 5 {
        Task {
            let next = try await fetchNextPage()
            cache.append(next)
        }
    }
}

func loadNextPageIfNeeded(currentIndex: Int) {
    if currentIndex > items.count - 5 {
        Task {
            let next = try await fetchNextPage()
            cache.append(next)
        }
    }
}

When the user reaches the next state, the data is already available.

Offline as reliability

Caching also improves resilience. When data cannot be fetched due to network issues or backend failures, cached data allows the flow to continue.

The goal is not perfect freshness, but continuity of the user experience.

Cache layers

Different goals require different strategies.

general-purpose cache → improves performance by storing recent data
fallback cache → ensures availability of critical flows

Separating these concerns allows better control over behavior.

Trade-offs

Caching introduces trade-offs:

stale data
inconsistencies between sources
complexity of invalidation

These factors must be managed explicitly.

What not to cache

Caching is not suitable for all cases:

highly dynamic data
sensitive operations requiring strong consistency
write-heavy flows

In such cases, recomputation or refetching is often preferable.

Closing thought

At this level, performance work focuses on eliminating repeated operations. Once reuse is introduced, both latency and system load decrease, and the system becomes more predictable.

Energy and power

Performance discussions often focus on latency. Energy introduces a different perspective. The same system that produces frames and handles requests also consumes battery. At this level, the question shifts from how fast something runs to how long and how often it runs.

Energy usage follows the work performed by the application. CPU and GPU activity both contribute to power consumption. Short bursts of work are usually inexpensive, while sustained activity over time is what drains the battery.

CPU as the primary driver

CPU activity is one of the main contributors to energy usage. The important factor is not a single operation, but the duration and frequency of activity.

A function that runs once and completes quickly has a negligible impact. A task that runs repeatedly, even if each iteration is small, keeps the CPU active and increases power usage.

Timer.scheduledTimer(withTimeInterval: 0.1, repeats: true) { _ in
    updateState()
}

Timer.scheduledTimer(withTimeInterval: 0.1, repeats: true) { _ in
    updateState()
}

This pattern creates continuous work. The CPU wakes up ten times per second regardless of whether the update is needed. Over time, this contributes more to energy consumption than a larger operation executed once.

Continuous work

A useful distinction is between burst work and continuous work.

Burst work happens in response to user actions or discrete events. It has a clear start and end. Once completed, the system can return to an idle state.

Continuous work runs over extended periods. It often comes from timers, polling, animations, or frequent state updates. The system remains active even when the user is not interacting with it.

Energy usage grows with sustained activity. Reducing or eliminating unnecessary continuous work usually has a noticeable effect.

Hidden work

Some sources of work are easy to overlook.

Timers running in the background, periodic network requests, or indirect state updates can keep the system active without providing visible value.

Task {
    while true {
        await refreshData()
        try? await Task.sleep(nanoseconds: 1_000_000_000)
    }
}

Task {
    while true {
        await refreshData()
        try? await Task.sleep(nanoseconds: 1_000_000_000)
    }
}

This loop performs a network request every second. Even if the data rarely changes, the application continues to perform work indefinitely.

Identifying and removing such patterns often reduces energy usage without affecting the user experience.

UI activity

UI updates contribute to sustained work when they occur frequently. Animations and scrolling require continuous frame production. Each frame goes through layout, drawing, and compositing. When updates happen at a steady rate, the system remains active for the duration of the interaction.

The impact depends on both frequency and complexity. Reducing unnecessary updates or limiting their duration lowers overall cost.

Background activity

Work performed outside direct user interaction deserves special attention. Background synchronization, periodic refresh, and long-running tasks can keep the application active when it is not visible. In these cases, the cost is harder to justify because there is no immediate user benefit.

Controlling when background work runs and how often it executes helps reduce unnecessary energy usage.

Measuring energy

Energy usage is not obvious from code inspection. It requires measurement. Tools such as the Power Profiler in Instruments show how different components contribute to overall power consumption over time. Patterns matter more than individual spikes. Observing realistic scenarios gives a more accurate picture than short synthetic tests.

Relationship to earlier layers

Energy reflects decisions made at previous levels:

broad dependencies increase update frequency
frequent UI changes increase rendering cost
repeated network activity extends execution time

Each of these contributes to sustained activity, which in turn affects power consumption.

Closing thought

At this level, performance work focuses on how long the system remains active. Reducing unnecessary continuous work leads to more efficient behavior over extended sessions.

Concurrency and scheduling

Even when the amount of work is reduced and each operation is efficient, the system can still feel slow. The difference often comes from when and where that work is executed.

At this level, performance depends on scheduling. Tasks compete for time and resources, and the way they are ordered affects responsiveness.

Scheduling and perceived latency

Two systems can perform the same operations and produce different results from the user’s perspective. If work that affects the interface is delayed behind background tasks, the application appears unresponsive. If it is prioritized correctly, the same workload can feel immediate.

Responsiveness depends on how quickly the system can react to input and produce visible changes.

Main thread and UI updates

The main thread is responsible for handling user input and updating the interface. Any blocking operation on this thread prevents the UI from progressing.

let data = try Data(contentsOf: url)

let data = try Data(contentsOf: url)

If this code runs on the main thread, the application cannot process touches or update the screen until the operation completes.

Keeping the main thread available for UI work is essential for responsiveness.

Moving work off the main thread

Work that does not require direct interaction with the UI should be performed elsewhere.

Task {
    let data = try await fetchData()
    await MainActor.run {
        self.state = data
    }
}

Task {
    let data = try await fetchData()
    await MainActor.run {
        self.state = data
    }
}

Here, data is fetched asynchronously, and only the final state update runs on the main thread.

The goal is not to move all work away from the main thread, but to keep it focused on UI updates.

Concurrency and parallelism

Concurrency describes how tasks are structured. Parallelism describes whether they run simultaneously. Improving responsiveness often comes from better structuring rather than increasing parallel execution.

Task structure

The way tasks are composed affects latency. As in Networking, sequential execution accumulates delays:

let a = await loadA()
let b = await loadB()

let a = await loadA()
let b = await loadB()

Independent operations can be executed concurrently:

async let a = loadA()
async let b = loadB()

let (resultA, resultB) = await (a, b)

async let a = loadA()
async let b = loadB()

let (resultA, resultB) = await (a, b)

This reduces total waiting time.

Priorities

Not all work has the same importance.

Task(priority: .background) {
    await syncData()
}

Task(priority: .background) {
    await syncData()
}

Priorities help ensure that user-facing tasks are executed before background work. Incorrect prioritization can delay visible updates.

Coordination and contention

When multiple tasks access shared resources, coordination is required. This introduces waiting. Actors, queues, and locks manage access differently, but all can introduce contention if overused.

Reducing unnecessary synchronization improves throughput and responsiveness.

Common scheduling issues

Typical problems include:

blocking the main thread
assigning high priority to non-critical tasks
creating too many competing tasks
unnecessary synchronization

These issues do not increase the amount of work, but they affect how it is distributed.

Relationship to previous layers

Scheduling interacts with earlier decisions:

architecture defines how much work exists
UI defines how often it runs
network defines when data becomes available

Scheduling determines how this work is executed over time.

Closing thought

At this level, performance work focuses on ensuring that critical tasks are executed without delay and that the system remains responsive.

Algorithms and data structures

Up to this point, improvements focused on reducing unnecessary work and organizing execution. This section addresses the cost of the remaining work. Algorithms and data structures define how much work is required in principle.

Complexity in practice

Algorithmic complexity describes how cost grows with input size, but it does not fully determine performance in real systems.

func containsLinear(_ array: [Int], value: Int) -> Bool {
    for element in array {
        if element == value {
            return true
        }
    }
    return false
}

func containsLinear(_ array: [Int], value: Int) -> Bool {
    for element in array {
        if element == value {
            return true
        }
    }
    return false
}

Using a different structure changes the behavior:

let set = Set(array)

func containsSet(_ set: Set<Int>, value: Int) -> Bool {
    return set.contains(value)
}

let set = Set(array)

func containsSet(_ set: Set<Int>, value: Int) -> Bool {
    return set.contains(value)
}

The difference becomes significant with large inputs or repeated operations. However, asymptotic complexity alone is not enough. Two approaches with similar Big-O can behave differently once memory access patterns and CPU behavior are taken into account.

When it matters

Algorithmic improvements matter when they affect hot paths. If most time is spent waiting on I/O, network, or rendering, optimizing algorithms may not change overall performance.

At the same time, when computation is on the critical path, algorithm choice defines the upper bound of performance. This is where improvements become visible.

Memory access patterns

The way data is accessed often matters more than the number of operations.

for i in 0..<array.count {
    process(array[i])
}

for i in 0..<array.count {
    process(array[i])
}

Sequential access is predictable and aligns well with CPU caches. Irregular access patterns introduce latency even if the number of operations is small.

This is why some theoretically efficient algorithms behave poorly in practice. For example, binary search minimizes comparisons, but jumps across memory and can miss the cache repeatedly. In contrast, a linear scan may perform more comparisons but benefit from predictable access.

Choosing data structures

Data structures define both operation cost and access patterns.

Arrays favor iteration and locality
Sets and dictionaries favor lookup but trade off locality
Trees support ordered access but often rely on pointer traversal

The choice is not only about complexity, but also about how data is laid out and accessed.

Abstractions and hidden cost

High-level abstractions can introduce overhead that is not visible at the source level. Generic code, protocol-based dispatch, and framework boundaries may add:

indirect calls
metadata lookups
lack of specialization

In hot paths, these costs can dominate the actual algorithm. Measurement is required to determine whether abstraction overhead is significant.

Predictability and branching

Processors rely on branch prediction to maintain throughput.

for value in array {
    if value > threshold {
        processHigh(value)
    } else {
        processLow(value)
    }
}

for value in array {
    if value > threshold {
        processHigh(value)
    } else {
        processLow(value)
    }
}

If the condition follows a stable pattern, prediction is effective. If it varies unpredictably, mispredictions occur and partially executed work must be discarded.

In tight loops, this can noticeably reduce performance. In some cases, restructuring the computation or data to improve predictability is more effective than reducing the number of operations.

Closing thought

At this level, performance is shaped not only by how many operations are required, but by how those operations interact with memory and CPU behavior. The most efficient approach is the one that aligns algorithmic structure with the way the system executes code.

Memory and representation

Algorithms define how many operations are required. Memory determines how expensive each operation is in practice.

Heap allocations

Frequent allocations introduce overhead and pressure the allocator.

for _ in 0..<1000 {
    let object = MyClass()
    process(object)
}

for _ in 0..<1000 {
    let object = MyClass()
    process(object)
}

Reducing allocations or reusing objects lowers cost and often improves cache locality.

Copying

Copy-on-write delays copying but does not eliminate it.

var a = largeArray
var b = a
b.append(1)

var a = largeArray
var b = a
b.append(1)

Frequent mutations can trigger repeated copies, turning a convenient abstraction into a performance cost.

Data layout

Contiguous memory improves performance:

for value in array {
    process(value)
}

for value in array {
    process(value)
}

Each element is located near the previous one, which matches how caches load data in blocks.

Pointer-heavy structures spread data across memory. Accessing them requires following references, increasing latency and reducing cache efficiency.

Data layout can dominate performance. Even an efficient algorithm can become slow if it repeatedly misses the cache.

Stack and heap

Stack allocation is cheap and predictable. Heap allocation is more flexible but more expensive. Value types often benefit from inline storage, reducing allocation overhead and improving locality.

Inline storage

Keeping data inside structures avoids additional allocations and improves access patterns. This is especially important in hot paths.

Borrowing and avoiding copies

Reusing memory avoids unnecessary work:

func process(_ values: ArraySlice<Int>) {
    for value in values {
        handle(value)
    }
}

func process(_ values: ArraySlice<Int>) {
    for value in values {
        handle(value)
    }
}

Working with slices or views allows processing subsets of data without copying, reducing memory traffic.

Closing thought

At this level, performance depends on how data is stored and accessed. Efficient memory usage reduces the cost of every operation and often has a larger impact than micro-optimizing individual instructions.

CPU and hardware

At the lowest level, performance depends on how the processor executes instructions. Code is written as a sequence of operations, but modern CPUs do not execute it in a strictly linear way.

They overlap work, execute instructions out of order, and attempt to keep all parts of the pipeline busy. This behavior explains why similar code can have very different performance characteristics.

Instruction pipeline

Execution is split into stages. While one instruction is being completed, others are already being prepared or executed. This overlap increases throughput. The pipeline is most efficient when instructions can flow without interruption. When execution stalls, overall performance drops.

Bottlenecks and stalls

Performance is limited not by the number of instructions alone, but by where the pipeline stalls.

Common causes include:

branch mispredictions
cache misses
memory latency
dependencies between instructions

Understanding which bottleneck is dominant is key to meaningful optimization.

Branch prediction

Conditional logic introduces uncertainty.

for value in array {
    if value > threshold {
        processHigh(value)
    } else {
        processLow(value)
    }
}

for value in array {
    if value > threshold {
        processHigh(value)
    } else {
        processLow(value)
    }
}

When prediction is correct, execution continues smoothly. When it is wrong, partially completed work is discarded and the pipeline restarts.

Unpredictable branching in tight loops can significantly reduce throughput. In some cases, restructuring logic to reduce unpredictability or using branchless patterns can help, but these optimizations should be guided by measurement.

Cache and memory access

Processors rely on multiple levels of cache. Accessing data already in cache is fast. Accessing data from main memory is much slower. Cache lines load blocks of memory at once. Sequential access benefits from this behavior, while irregular access patterns lead to cache misses and stalls.

Data layout and cache behavior

Data layout directly affects cache efficiency. Contiguous structures align with cache behavior. Pointer-based structures often do not.

Some algorithms are inherently cache-unfriendly. In such cases, reorganizing data can improve performance even without changing the algorithm itself.

A simple example illustrates the difference:

// Contiguous memory
let array = Array(0..<1_000_000)

for value in array {
    process(value)
}

// Contiguous memory
let array = Array(0..<1_000_000)

for value in array {
    process(value)
}

// Pointer-based structure
final class Node {
    let value: Int
    var next: Node?

    init(value: Int, next: Node? = nil) {
        self.value = value
        self.next = next
    }
}

var head: Node? = nil
for i in (0..<1_000_000).reversed() {
    head = Node(value: i, next: head)
}

var current = head
while let node = current {
    process(node.value)
    current = node.next
}

// Pointer-based structure
final class Node {
    let value: Int
    var next: Node?

    init(value: Int, next: Node? = nil) {
        self.value = value
        self.next = next
    }
}

var head: Node? = nil
for i in (0..<1_000_000).reversed() {
    head = Node(value: i, next: head)
}

var current = head
while let node = current {
    process(node.value)
    current = node.next
}

Both examples perform a linear traversal. The number of operations is similar, but the memory access pattern is not.

In the array case, elements are stored next to each other. This allows the CPU to load multiple values at once using cache lines, making access predictable and efficient.

In the linked structure, each step follows a pointer to an unrelated memory location. This breaks locality and often results in cache misses, forcing the CPU to wait for data.

Instruction-level parallelism

Modern CPUs exploit instruction-level parallelism by overlapping the execution of independent operations within a single core. While one instruction is being processed, others can be prepared, scheduled, or executed in parallel across different units of the pipeline.

This parallelism is only possible when operations do not depend on each other.

// Dependent operations
var sum = 0
for i in 0..<array.count {
    sum += array[i]
}

// Dependent operations
var sum = 0
for i in 0..<array.count {
    sum += array[i]
}

Each iteration depends on the previous value of sum, which limits parallel execution.

// Independent operations
var a = 0
var b = 0

for i in stride(from: 0, to: array.count, by: 2) {
    a += array[i]
    b += array[i + 1]
}

// Independent operations
var a = 0
var b = 0

for i in stride(from: 0, to: array.count, by: 2) {
    a += array[i]
    b += array[i + 1]
}

Here, updates to a and b are independent, allowing the CPU to execute them more efficiently.

This is different from thread-level parallelism. Instruction-level parallelism happens within a single core and is managed by the processor itself. Adding more threads does not help if execution is limited by data dependencies, pipeline stalls, or memory access latency.

Measurement

At this level, intuition is unreliable. Profiling shows where time is spent. More advanced tools can reveal:

exact execution paths
abstraction overhead
hardware bottlenecks

Different tools answer different questions. Without measurement, optimization efforts often target the wrong problem.

Closing thought

CPU-level optimization is most effective when higher-level inefficiencies have already been removed. At that point, understanding pipeline behavior, branching, and memory access can lead to measurable improvements.

Connecting the layers

Each section in this article focuses on a different level of the system. Taken in isolation, they can look like separate topics. In practice, they describe the same problem from different angles.

A performance issue rarely belongs to a single layer. A slow screen might be caused by unnecessary data fetching, excessive UI updates, or inefficient scheduling. Looking at one layer without considering the others often leads to local optimizations with limited impact.

A common mistake is to start at the lowest level. It is tempting to analyze CPU behavior or micro-optimize code because it is precise and measurable. However, by the time execution reaches that level, most of the cost has already been defined by earlier decisions.

A more reliable approach follows the flow of the system.

It starts with the product perspective. The first step is to determine whether there is a real problem from the user’s point of view. If the experience is acceptable, further optimization may not change outcomes.

Once the problem is confirmed, metrics provide a way to locate it. They show where time is spent and which scenarios are affected.

Architecture explains why the system performs that work. It defines how data flows and how widely updates propagate. Changes at this level often reduce the total amount of work.

UI and network layers reveal how work is executed and where delays appear during interaction.

Runtime and memory refine the cost of operations. They matter when unnecessary work has already been removed.

CPU-level behavior becomes relevant when the remaining workload is concentrated in specific hot paths. At that point, low-level optimizations can provide additional gains.

This order is important. CPU tools are most useful after abstraction overhead and unnecessary work have been addressed. Otherwise, they often measure symptoms rather than causes.

Across all these levels, one idea remains consistent. Performance improves when the system performs less work and performs it at the right time.

In many cases, the most effective change is not to make code faster, but to avoid running it at all. Deferring work, reducing how often it runs, and simplifying what it does often lead to larger improvements than optimizing individual operations.

This leads to a practical way to think about optimization.

Reduce the amount of work. Ensure that the remaining work is necessary. Schedule it so that it does not interfere with responsiveness. Execute it in a way that aligns with the system and the hardware.

Measurement is what connects these ideas to real systems. Performance issues should be identified before they are optimized. Once a bottleneck is located, targeted improvements can have a measurable impact. Without that, changes often increase complexity without affecting the outcome.

Optimizing everything is rarely effective. Most parts of the system are not limiting performance, and improvements there do not translate into a better user experience.

When the problem is well understood and the bottleneck is clear, lower-level optimizations become meaningful. Without that context, improvements at the bottom of the stack rarely change overall behavior.