Diving into Swift compiler performance

It all starts by reading this week in Swift, and the article The best hardware to build with Swift is not what you might think, written by the LinkedIn team about how apparently their Mac Pros are slower at building Swift than any other Mac.

I’ve spent so much time waiting for Xcode to compile over the past years than I’ve often toyed with the idea of getting an iMac or even a Mac Pro for the maximum possible performance, so this caught my attention. I’ve also been wondering if instead of throwing money at the problem, there might be some easy improvements to either reduce build time or to improve Swift performance.

Looking at the reported issue I discovered a couple Swift compiler flags that were new to me: -driver-time-compilation and -Xfrontend -debug-time-compilation, which will show something like this:

                               Swift compilation
  Total Execution Time: 10.1296 seconds (10.6736 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   3.9556 ( 99.9%)   6.1701 (100.0%)  10.1257 (100.0%)  10.6697 (100.0%)  Type checking / Semantic analysis
   0.0013 (  0.0%)   0.0002 (  0.0%)   0.0015 (  0.0%)   0.0015 (  0.0%)  LLVM output
   0.0011 (  0.0%)   0.0001 (  0.0%)   0.0013 (  0.0%)   0.0013 (  0.0%)  SILGen
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  IRGen
   0.0003 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)  LLVM optimization
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Parsing
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  SIL optimization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Name binding
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  AST verification
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  SIL verification (pre-optimization)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  SIL verification (post-optimization)
   3.9589 (100.0%)   6.1707 (100.0%)  10.1296 (100.0%)  10.6736 (100.0%)  Total

I started looking into the results for the WordPress app and it looks like almost every bottleneck is in the Type Checking stage. As I’d find later, mostly in type inference. I’ve only tested this on Debug builds, as that’s where improvements could impact development best. For Release, I’d suspect the optimization stage would take a noticeable amount of time.

Time to look at what’s the slowest thing, and why. Disclaimer: the next shell commands might look extra complicated. I’ve used those tools for over a decade but never managed to fully learn all their power, so I’d jump from grep to awk to sed to cut and back, just because I know how to do it that way. I’m sure there’s a better way, but this got me the results I wanted so ¯\(ツ)/¯.

Before you run these close Xcode, and really everything else if you can so you get more reliable results.

Do a clean build with all the debug flags and save the log. That way you can query it later without having to do another build.

xcodebuild -destination 'platform=iOS Simulator,name=iPhone 7' \
  -sdk iphonesimulator -workspace WordPress.xcworkspace \
  -scheme WordPress -configuration Debug \
  clean build \
  OTHER_SWIFT_FLAGS="-driver-time-compilation \
    -Xfrontend -debug-time-function-bodies \
    -Xfrontend -debug-time-compilation" |
tee profile.log

Print the compiled files sorted by build time:

awk '/Driver Time Compilation/,/Total$/ { print }' profile.log |
  grep compile |
  cut -c 55- |
  sed -e 's/^ *//;s/ (.*%)  compile / /;s/ [^ ]*Bridging-Header.h$//' |
  sed -e "s|$(pwd)/||" |
  sort -rn |
  tee slowest.log

Show the top 10 slowest files:

head -10 slowest.log
2.9555 WordPress/Classes/Extensions/Math.swift
2.8760 WordPress/Classes/Utility/PushAuthenticationManager.swift
2.8751 WordPress/Classes/ViewRelated/Post/AztecPostViewController.swift
2.8748 WordPress/Classes/ViewRelated/People/InvitePersonViewController.swift
2.8741 WordPress/Classes/ViewRelated/System/PagedViewController.swift
2.8699 WordPress/Classes/ViewRelated/Views/WPRichText/WPTextAttachmentManager.swift
2.8680 WordPress/Classes/ViewRelated/Views/PaddedLabel.swift
2.8678 WordPress/Classes/ViewRelated/NUX/WPStyleGuide+NUX.swift
2.8666 WordPress/Classes/Networking/Remote Objects/RemoteSharingButton.swift
2.8162 Pods/Gridicons/Gridicons/Gridicons/GridiconsGenerated.swift

Almost 3 seconds on Math.swift? That doesn’t make any sense. Thanks to the -debug-time-function-bodies flag, I can look into profile.log and see it’s all the round function. To make this easier, and since it doesn’t depend on anything else in the app, I extracted that to a separate file. In this case, the -Xfrontend -debug-time-expression-type-checking flag helped identifying the line where the compiler was spending all the time:

return self + sign * (half - (abs(self) + half) % divisor)

When you look at it, it seems pretty obvious that those are all Ints, right? But what’s obvious to humans, might not be to a compiler. I tried another flag -Xfrontend -debug-constraints which resulted in a 53MB log file 😱. But trying to make sense of it, it became apparent that abs was generic, so the compiler had to guess, and +,-,*, and % had also multiple candidates each, so the type checker seems to go through every combination rating them, before picking a winner. There is some good information on how the type checker works in the Swift repo, but I still have to read that completely.

A simple change (adding as Int) turns the 3 seconds into milliseconds:

return self + sign * (half - (abs(self) as Int + half) % divisor)

I’ve kept going through the list and in many cases I still can’t figure out what is slow, but there were some quick wins there. After 4 simple changes, build time was reduced by 18 seconds, a 12% reduction.

Attaching patches to Pull Requests

This might sound strange, but sometimes I prefer patches to pull requests. The main scenario is when I’m reviewing someone else’s code and I want to propose an alternative implementation.

I could just create a new branch and pull request with my change, but then the conversation is split between two PRs, and there’s a new branch that you have to clean up.

When the change is small enough, or I’m not sure if it will be accepted, I’d rather send a patch. So far I’ve been doing git diff, uploading the result to gist, and posting the link as a comment in the PR. This has a few shortcomings:

  • No binary support.
  • If the original author wants to use it, authorship is usually lost, unless they use the --author option for git commit, and even then there’s room for typos.

I know there’s a better way, as Git was originally designed to share patches, not pull requests. I think I’ve been avoiding it because it’s not as common and the original author might not know what to do with the patch. So I’m writing this as a quick tutorial.

Creating a patch

Before creating a patch, you have to commit your changes. git format-patch will create a patch file for each commit, so your history can be preserved. Once you have a commit, your branch is ahead of origin, so we can use that to tell format-patch which commits to pick

branch=git rev-parse --abbrev-ref HEAD
git format-patch $origin

This will leave one or more .patch files in your project directory:

$ ls *.patch

Upload those to Gist and leave a comment with the link on the PR:

$ gist -co *.patch

Applying a patch

For a single patch, you can copy the Raw link in the Gist and download it

$ curl -sLO https://gist.github.com/koke/1b30d861e6bb9d366f69bc186d0e9525/raw/8cc27f3e589a7823b2e9f1746aa921b92da14187/0001-Store-relative-paths-for-reader-topics.patch

If there are multiple files, make sure you use the Download Zip link (or download all the files one by one):

$ curl -sLo patches.zip https://gist.github.com/koke/ab100907c17c4ef6a977350494679091/archive/3fb0136a21a6bc499bff2511750c62ae6dc41630.zip
$ unzip -j patches.zip 
Archive:  patches.zip
  inflating: 0001-Store-relative-paths-for-reader-topics.patch  
  inflating: 0002-Whitespace-changes.patch  

Once you have the patch file(s) in your project directory, just run git am -s *.patch:

$ git am -s *.patch
Applying: Store relative paths for reader topics
Applying: Whitespace changes

Review the changes, and if you’re happy with them, git push them. Otherwise, you can reset your branch to point at the pushed changes:

branch=git rev-parse --abbrev-ref HEAD
git reset --hard $origin

Finally, run git clean -df, or manually remove the downloaded files.

Composing operations in Swift

Continuing on From traditional to reactive, the problem I’m solving today is refactoring our image downloading system(s).

If I remember it correctly, it all started a long time ago when we switched to AFNetworking, and started using its UIImageView.setImageWithURL methods. To this day I still feel that there’s something terribly wrong in a view calling networking code directly, but that’s not today’s problem. Instead it’s how this system grew by adapting to each use case with ad-hoc solutions.

If I haven’t missed anything, we have:

  • AFNetworking’s setImageWIthURL: you pass a URL and maybe a placeholder, and it eventually sets the image. It uses its own caching.
  • UIImageView.downloadImage(_:placeholderImage:), which is basically a wrapper for AFNetworking’s method which sets a few extra headers. Although it’s not very obvious why it’s even there.
  • A newer UIImageView.downloadImage(_), which skips AFNetworking and uses NSURLSession directly. This was recently developed for the sharing extension to remove the AFNetworking dependency. It creates the request, does the networking, sets the result, and it adds caching. It also cancels any previous requests if you reset the image view’s URL.
  • Then we have downloadGravatar and downloadBlavatar which are just wrappers that take an email/hash or hostname and download the right Gravatar. At least these were refactored not long ago to move the URL generation into a separate Gravatar type. Although it seems the old methods are also still there and used in a few places.
  • WPImageSource: basically all it does is prevent duplicate requests. Imagine you just loaded a table to display some comments and you need to load the gravatars. Several of the rows are for the same commenter, so the naive approach would request the same gravatar URL several times. This coalesces all of the requests into one. This doesn’t really have anything to do with images in theory, and could be more generic, although it also handles adding authentication headers for private blogs.
  • WPTableImageSource: this one does quite a few things. The reason behind it was to load images for table view cells without sacrificing scrolling performance. So it uses Photon to download images with the size we need. If Photon doesn’t give us what we want (if the original image is smaller than the target size, we don’t want to waste bandwidth) we resize locally. We cache both original and resized images. It also keeps track of the index path that corresponds to an image so we can call the delegate when it’s ready, and it supports invalidating the stored index paths when the table view contents have changed, so we don’t set the images in the wrong cell.

These are all variations of the same problem, built in slightly different ways. What I want to do is take all of this into composable units, each with a single responsibility.

So I built an example for one of the simple cases: implement a UIImageView.setImageWithURL(_) that uses Photon to resize the image to the image view’s size, downloads the image and sets it. No caching, placeholders or error handling, but it customizes the request a bit.

This is what the original code looked like approximately (changed a bit so the example is easier to follow):

The problem I encountered is that Photon has an issue resizing transparent images, so for this I want to skip photon and do the resizing on the device. If you want to do that with the original code I see two options: duplicate it, or start adding options to the function to change its behavior. I’m not sure which option is worse.

The first step is to break the existing method into steps:

  1. “Photonize” the URL so we are asking for a resized image
  2. Build the request
  3. Fetch the data
  4. Turn it into an image
  5. Make sure it’s an image, or stop
  6. Set the image view’s image to that

So I’ve extracted most of those steps into functions and this is what I got:

This is much more functional, even if it looks sequential. It’s written that way because I find it’s easier to read, but you could write the same thing using nested functions.

If you’re into Lisp, you might find this version more pleasing, but in Swift this feels harder to follow than using intermediate variables. This is why other languages have things like the compose operator which you can use to compose functions without all that nesting. When I followed this route, I hit some limitations on Swift generics, and it also looks very foreign. This might work well in Haskell where all functions are curried, but it’s not so much in Swift.

Also note that the pattern of nesting breaks on fetchData as we don’t have a result yet. I would hardly call that callback hell, but we can start moving in that direction easily if we say we want all the image processing to happen in a background queue. Or if image resizing was an asynchronous operation.

Then I tried with RxSwift. All I wanted for this could be done with a much simpler Future type, since I don’t need side effects, cancellation, or any operator other than map/flatMap. But I already had RxSwift on the project so I’m using it as an example of the syntax. I also added a second version with the transform chain grouped into smaller pieces to improve readability.

What I haven’t tried yet is a solution based on NSOperation, but I have the feeling that it would add a lot of boilerplate, and wouldn’t feel completely right in Swift.

Finally, what I think I’d build is something based on the traditional version with customization points. I’d love to flatten the data pipeline and be able to just keep mapping over asynchronous values without depending on an external framework. For this example though, it seems that callbacks don’t complicate things much, at least yet. So I think I’ll start from something like this.

Here’s a gist with all the examples and the helper functions: ImageServices.swift

If you have a better design, I’d love to hear about it.

From traditional to reactive

Brent Simmons has a series of posts about Reactive vs “Traditional” which sound a lot like the arguments we’re having internally as a team lately. I’ve gotten into RxSwift in the past months and believe it’s a great solution to a number of problems, but it is indeed a big dependency and has a steep learning curve.

I find RxSwift declarative style much easier to read and follow than the imperative counterparts, but it took a while to get there. I remember struggling for months learning to think in terms of streams, and I was really motivated to learn it.

But I thought I’d take some time to take the example in Comparing Reactive and Traditional, as I don’t think it’s a fair comparison (and not in the way you might expect). You can download all the steps in a playground: Traditional to Reactive.playground.

My first “refactor” was just to clean up the code enough to compiled without errors, so I can put it on a playground.

Then I addressed what I found most confusing about the traditional implementation, which is all that throttling logic scattered around the view controller. There’s no need to go reactive there, we just need to contain all that logic and state in one place: Throttle. This is a generic class that takes a timeout and a callback, and you can send new values or cancel. It will call callback after timeout seconds since the last input unless a new input happens. Note that this class is generic and doesn’t need to know anything about the values. It is also easily testable.

This leaves us with a simpler view controller, but we still have some logic in there that’s not really specific and could be extracted: keeping only the last request and canceling the previous one. I’ve called this SwitchToLatest, and it couldn’t be simpler: it just has a switchTo method that keeps a reference to the task, and calls cancel on it whenever there’s a new one. It uses a new Cancelable protocol since we don’t really care about what the task is, we just need to know that we can cancel it. Also really easy to test.

And this is how the final ViewController looks:

Here’s where I think the fair comparison starts. The traditional version has all the behaviors we want in units with a single responsibility, and the view controller takes these building blocks to do what it needs.

What RxSwift brings to the table is the ability to compose these units and a common pattern for cancellation. What you can’t do here is take Throttle and pipe its output into SwitchToLatest. But in this case, that’s fine.

The live search example is typically a nice example of reactive code as it showcases all of it’s features: streams of values, composition of operators, side effects on subscription, and disposables/cancellation. But as we’ve seen, it doesn’t work as a way for reactive code to prove its value, as the traditional counterpart is simple enough.

Compare this to the thing I was trying to write in RxSwift.

  • We have some data that needs to be loaded from the API. That data might be used by a few different view controllers.
  • When a request fails because of network conditions, it should be automatically retried a few times before we give up.
  • We want to poll for new data every minute, counting since the last successful refresh.
  • If a request is taking more than a few seconds, the UI should display a message indicating that “this is taking longer than expected”.
  • If the request fails because an “unrecoverable” error or we give up on retrying, the UI should display an error and we shouldn’t try to poll for data until the user manually refreshes again.
  • The UI should show a message when it detects it’s offline, and stop any polling attempts. When we detect we’re online we should resume polling. (I later learned that Apple doesn’t recommend using Reachability to prevent network requests unless they’ve already failed).
  • If a second VC wants the same data, there should not be a second request going out. Not just caching, but reusing in-progress requests.

When written in RxSwift it was already a complex logic to follow, and I even had to write a couple of custom operators (pausable and retryIf). There’s no silver bullet, and that’s also true for RxSwift, but I believe it can help. I can’t imagine reasoning about all the possible states in a more traditional design.

I also would argue (although someone might disagree) that any time you use NSNotificationCenter, NSFetchedResultsController, or KVO, you are already doing some sort of reactive programming, just not in a declarative way.

Learn me a Haskell

Lately I’ve become interested in functional programming, and Haskell always felt like the thing I should learn, so this week I finally started reading  Learn You a Haskell. So far I’ve been through the first six chapters and they do a great job at indroducing many concepts.

Pattern matching got me hooked. I had read the Patterns chapter in the swift book, and a couple blog posts on the subject, but they always felt like little more than “switch with ranges”. After reading the pattern matching chapter in Learn You a Haskell, suddenly Patterns appeared much more useful, even in Swift.

And then I read some of the concepts around higher order functions, how all functions are curried, how trivial partial application is, and how concise and clear everything reads once you get past the syntax choices. My mind was blown time after time.