PricingDocs
profile picture for Iain Finlayson

Iain Finlayson

10 reasons not to use bitdrift: 4, you only care about crashes

If your app is 99% crash-free, does that mean 99% of users are happy? Almost never. Yet mobile teams obsess over crash-free rates. In this post, the fourth in a series investigating reasons not to use bitdrift, we’ll discuss why crash-free rates and crash reporting tools alone do not constitute a mobile observability strategy.

Guy with blue hair using a microscope to inspect a giant error warning on a mobile device
For a long time, Mobile Observability wasn’t really a thing people talked about. All dev teams had was backend monitoring, which doesn’t tell you much about how users are experiencing your app, and crash reports, which only capture the rare issues that cause the app process to stop. If crash rates were below 1% of sessions, that meant 99% of customers were happy, right? Wrong. Let’s discuss some of the reasons why.

Focus on p99s, not crashes

If “crash-free rate” is your primary health metric, you’re effectively blind to what’s happening in 99% of your user experiences. To give you an idea of what you might be missing, here are a few examples.
  • A user abandons their cart because checkout is slow
  • A user backgrounds the app because the search hangs
  • A user switches streaming services because playback stutters
  • A gamer rage-taps through a soft lock and uninstalls
At bitdrift, we call these the “p99 issues”. These issues rarely show up in dashboards or crash reports. But they show up immediately in churn, reviews, and revenue. Others use the term “paper cut issues.” I think you get the analogy: rare, intermittent, environment-dependent issues that affect only a small percentage of users. None of these situations triggers a crash, but over time, they degrade the user experience and lead to churn. Crash reporting tools don’t catch most p99 issues, and when they do surface, they are often impossible to reproduce or debug. It’s also very likely that your server-side observability tooling is missing them, too. For more on this topic, see my previous post. To illustrate what we mean by a p99 issue, let’s look at some real customer data. The charts below show completely unsampled time-to-interactive (TTI) metrics gathered from 1.2 million app starts over a seven-day period. The 50th percentile looks pretty good, with TTI in the 2–2.5s range. The p99, on the other hand, tells a very different story, with TTI consistently in the 20–25s range. That’s roughly 12,000 app starts that took more than 20 seconds. That’s 12,000 chances to lose a customer in one week! Crash reports also don’t help here because they don’t measure performance. They don’t help you understand the true p99 latencies on your critical APIs, your users’ true p99 memory and disk usage, or your true p99 app install sizes. In fact, they can’t even tell you what your true crash rate is. Let’s talk about that next.

Sampling impacts crash rates, too!

Can you trust your stability metrics if your crash reporting tool is sampling? Short answer, no. Let’s look at some real data again. The following image shows an Instant Insights stability dashboard from one of our customers with millions of daily active users (DAUs). This customer’s legacy crash reporting tool consistently reported crash-free rates above 99%. Why? Because it only calculated that rate on a 5% sample. For a customer of this size, 1% represents hundreds of thousands of user sessions per day.

Built for the edge cases

bitdrift doesn’t optimize for averages. It optimizes for outliers. The Capture SDK generates synthetic metrics on the device and sends only the telemetry you need for real-user monitoring and alerting, without the overhead of sending all logs. This is how bitdrift gives you fully unsampled dashboards without impacting performance (and therefore your users’ experience). This means you catch every slow request, every hang, every retry storm, every degraded session, and every weird device/network combination. When something happens only to the 99th percentile, you can catch and debug it. Let’s move on to why debugging can suck with crash reporting tools.

How bitdrift does crash reporting, properly

Just in case you were starting to think bitdrift doesn’t do crash reporting, let me be clear: bitdrift does crash reporting. Only, it does it better. There are three very common complaints we hear when talking to mobile devs about their crash reporting tools:
  • They constantly have to context switch between multiple tools while debugging issues
  • They can’t reproduce crashes in rare cohorts
  • They don’t have enough context beyond stack traces to debug a crash
bitdrift solves all three problems. First, bitdrift combines crash reporting, real user monitoring, alerting, debugging, session replay, and network traces in a single product. Second, bitdrift eliminates the need to reproduce those weird edge cases that cause issues in very unique cohorts. bitdrift doesn’t sample, so you catch every single weird edge case. When a crash or critical issue occurs, bitdrift automatically flushes the entire ring buffer from the device. That means you don’t just get a stack trace. You get the full story of what happened before it (for more on this topic, see the first post in this series). And if you need example sessions from a weird cohort you’ve identified, workflows let you target them at runtime and pull their logs without a release cycle. So there really is never a need to repro issues. Flushing the ring buffer along with every crash solves the final problem of insufficient context. Stack traces are helpful but often not sufficient to identify the root cause, and devs often find that when they try to find the corresponding session replay or traces in another tool, they're not there. This is never the case with bitdrift. In summary, bitdrift is a one-stop shop for all things mobile observability. But if you are happy with just crash reporting, you probably shouldn’t use bitdrift. If this resonates, check out Reason #1 / #2 / #3 here. And if you want to try bitdrift for yourself, sign up for free to see it in action.

Stay in the know, sign up to the bitdrift newsletter.

Author


profile picture for Iain Finlayson

Iain Finlayson