Diving into Swift compiler performance

It all starts by reading this week in Swift, and the article The best hardware to build with Swift is not what you might think, written by the LinkedIn team about how apparently their Mac Pros are slower at building Swift than any other Mac.

I’ve spent so much time waiting for Xcode to compile over the past years than I’ve often toyed with the idea of getting an iMac or even a Mac Pro for the maximum possible performance, so this caught my attention. I’ve also been wondering if instead of throwing money at the problem, there might be some easy improvements to either reduce build time or to improve Swift performance.

Looking at the reported issue I discovered a couple Swift compiler flags that were new to me: -driver-time-compilation and -Xfrontend -debug-time-compilation, which will show something like this:

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
  Total Execution Time: 10.1296 seconds (10.6736 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   3.9556 ( 99.9%)   6.1701 (100.0%)  10.1257 (100.0%)  10.6697 (100.0%)  Type checking / Semantic analysis
   0.0013 (  0.0%)   0.0002 (  0.0%)   0.0015 (  0.0%)   0.0015 (  0.0%)  LLVM output
   0.0011 (  0.0%)   0.0001 (  0.0%)   0.0013 (  0.0%)   0.0013 (  0.0%)  SILGen
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  IRGen
   0.0003 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)  LLVM optimization
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Parsing
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  SIL optimization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Name binding
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  AST verification
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  SIL verification (pre-optimization)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  SIL verification (post-optimization)
   3.9589 (100.0%)   6.1707 (100.0%)  10.1296 (100.0%)  10.6736 (100.0%)  Total

I started looking into the results for the WordPress app and it looks like almost every bottleneck is in the Type Checking stage. As I’d find later, mostly in type inference. I’ve only tested this on Debug builds, as that’s where improvements could impact development best. For Release, I’d suspect the optimization stage would take a noticeable amount of time.

Time to look at what’s the slowest thing, and why. Disclaimer: the next shell commands might look extra complicated. I’ve used those tools for over a decade but never managed to fully learn all their power, so I’d jump from grep to awk to sed to cut and back, just because I know how to do it that way. I’m sure there’s a better way, but this got me the results I wanted so ¯\(ツ)/¯.

Before you run these close Xcode, and really everything else if you can so you get more reliable results.

Do a clean build with all the debug flags and save the log. That way you can query it later without having to do another build.

xcodebuild -destination 'platform=iOS Simulator,name=iPhone 7' \
  -sdk iphonesimulator -workspace WordPress.xcworkspace \
  -scheme WordPress -configuration Debug \
  clean build \
  OTHER_SWIFT_FLAGS="-driver-time-compilation \
    -Xfrontend -debug-time-function-bodies \
    -Xfrontend -debug-time-compilation" |
tee profile.log

Print the compiled files sorted by build time:

awk '/Driver Time Compilation/,/Total$/ { print }' profile.log |
  grep compile |
  cut -c 55- |
  sed -e 's/^ *//;s/ (.*%)  compile / /;s/ [^ ]*Bridging-Header.h$//' |
  sed -e "s|$(pwd)/||" |
  sort -rn |
  tee slowest.log

Show the top 10 slowest files:

head -10 slowest.log
2.9555 WordPress/Classes/Extensions/Math.swift
2.8760 WordPress/Classes/Utility/PushAuthenticationManager.swift
2.8751 WordPress/Classes/ViewRelated/Post/AztecPostViewController.swift
2.8748 WordPress/Classes/ViewRelated/People/InvitePersonViewController.swift
2.8741 WordPress/Classes/ViewRelated/System/PagedViewController.swift
2.8699 WordPress/Classes/ViewRelated/Views/WPRichText/WPTextAttachmentManager.swift
2.8680 WordPress/Classes/ViewRelated/Views/PaddedLabel.swift
2.8678 WordPress/Classes/ViewRelated/NUX/WPStyleGuide+NUX.swift
2.8666 WordPress/Classes/Networking/Remote Objects/RemoteSharingButton.swift
2.8162 Pods/Gridicons/Gridicons/Gridicons/GridiconsGenerated.swift

Almost 3 seconds on Math.swift? That doesn’t make any sense. Thanks to the -debug-time-function-bodies flag, I can look into profile.log and see it’s all the round function. To make this easier, and since it doesn’t depend on anything else in the app, I extracted that to a separate file. In this case, the -Xfrontend -debug-time-expression-type-checking flag helped identifying the line where the compiler was spending all the time:

return self + sign * (half - (abs(self) + half) % divisor)

When you look at it, it seems pretty obvious that those are all Ints, right? But what’s obvious to humans, might not be to a compiler. I tried another flag -Xfrontend -debug-constraints which resulted in a 53MB log file 😱. But trying to make sense of it, it became apparent that abs was generic, so the compiler had to guess, and +,-,*, and % had also multiple candidates each, so the type checker seems to go through every combination rating them, before picking a winner. There is some good information on how the type checker works in the Swift repo, but I still have to read that completely.

A simple change (adding as Int) turns the 3 seconds into milliseconds:

return self + sign * (half - (abs(self) as Int + half) % divisor)

I’ve kept going through the list and in many cases I still can’t figure out what is slow, but there were some quick wins there. After 4 simple changes, build time was reduced by 18 seconds, a 12% reduction.

Attaching patches to Pull Requests

This might sound strange, but sometimes I prefer patches to pull requests. The main scenario is when I’m reviewing someone else’s code and I want to propose an alternative implementation.

I could just create a new branch and pull request with my change, but then the conversation is split between two PRs, and there’s a new branch that you have to clean up.

When the change is small enough, or I’m not sure if it will be accepted, I’d rather send a patch. So far I’ve been doing git diff, uploading the result to gist, and posting the link as a comment in the PR. This has a few shortcomings:

  • No binary support.
  • If the original author wants to use it, authorship is usually lost, unless they use the --author option for git commit, and even then there’s room for typos.

I know there’s a better way, as Git was originally designed to share patches, not pull requests. I think I’ve been avoiding it because it’s not as common and the original author might not know what to do with the patch. So I’m writing this as a quick tutorial.

Creating a patch

Before creating a patch, you have to commit your changes. git format-patch will create a patch file for each commit, so your history can be preserved. Once you have a commit, your branch is ahead of origin, so we can use that to tell format-patch which commits to pick

branch=git rev-parse --abbrev-ref HEAD
origin="origin/$branch"
git format-patch $origin

This will leave one or more .patch files in your project directory:

$ ls *.patch
0001-Store-relative-paths-for-reader-topics.patch

Upload those to Gist and leave a comment with the link on the PR:

$ gist -co *.patch

Applying a patch

For a single patch, you can copy the Raw link in the Gist and download it

$ curl -sLO https://gist.github.com/koke/1b30d861e6bb9d366f69bc186d0e9525/raw/8cc27f3e589a7823b2e9f1746aa921b92da14187/0001-Store-relative-paths-for-reader-topics.patch

If there are multiple files, make sure you use the Download Zip link (or download all the files one by one):

$ curl -sLo patches.zip https://gist.github.com/koke/ab100907c17c4ef6a977350494679091/archive/3fb0136a21a6bc499bff2511750c62ae6dc41630.zip
$ unzip -j patches.zip 
Archive:  patches.zip
3fb0136a21a6bc499bff2511750c62ae6dc41630
  inflating: 0001-Store-relative-paths-for-reader-topics.patch  
  inflating: 0002-Whitespace-changes.patch  

Once you have the patch file(s) in your project directory, just run git am -s *.patch:

$ git am -s *.patch
Applying: Store relative paths for reader topics
Applying: Whitespace changes

Review the changes, and if you’re happy with them, git push them. Otherwise, you can reset your branch to point at the pushed changes:

branch=git rev-parse --abbrev-ref HEAD
origin="origin/$branch"
git reset --hard $origin

Finally, run git clean -df, or manually remove the downloaded files.

Composing operations in Swift

Continuing on From traditional to reactive, the problem I’m solving today is refactoring our image downloading system(s).

If I remember it correctly, it all started a long time ago when we switched to AFNetworking, and started using its UIImageView.setImageWithURL methods. To this day I still feel that there’s something terribly wrong in a view calling networking code directly, but that’s not today’s problem. Instead it’s how this system grew by adapting to each use case with ad-hoc solutions.

If I haven’t missed anything, we have:

  • AFNetworking’s setImageWIthURL: you pass a URL and maybe a placeholder, and it eventually sets the image. It uses its own caching.
  • UIImageView.downloadImage(_:placeholderImage:), which is basically a wrapper for AFNetworking’s method which sets a few extra headers. Although it’s not very obvious why it’s even there.
  • A newer UIImageView.downloadImage(_), which skips AFNetworking and uses NSURLSession directly. This was recently developed for the sharing extension to remove the AFNetworking dependency. It creates the request, does the networking, sets the result, and it adds caching. It also cancels any previous requests if you reset the image view’s URL.
  • Then we have downloadGravatar and downloadBlavatar which are just wrappers that take an email/hash or hostname and download the right Gravatar. At least these were refactored not long ago to move the URL generation into a separate Gravatar type. Although it seems the old methods are also still there and used in a few places.
  • WPImageSource: basically all it does is prevent duplicate requests. Imagine you just loaded a table to display some comments and you need to load the gravatars. Several of the rows are for the same commenter, so the naive approach would request the same gravatar URL several times. This coalesces all of the requests into one. This doesn’t really have anything to do with images in theory, and could be more generic, although it also handles adding authentication headers for private blogs.
  • WPTableImageSource: this one does quite a few things. The reason behind it was to load images for table view cells without sacrificing scrolling performance. So it uses Photon to download images with the size we need. If Photon doesn’t give us what we want (if the original image is smaller than the target size, we don’t want to waste bandwidth) we resize locally. We cache both original and resized images. It also keeps track of the index path that corresponds to an image so we can call the delegate when it’s ready, and it supports invalidating the stored index paths when the table view contents have changed, so we don’t set the images in the wrong cell.

These are all variations of the same problem, built in slightly different ways. What I want to do is take all of this into composable units, each with a single responsibility.

So I built an example for one of the simple cases: implement a UIImageView.setImageWithURL(_) that uses Photon to resize the image to the image view’s size, downloads the image and sets it. No caching, placeholders or error handling, but it customizes the request a bit.

This is what the original code looked like approximately (changed a bit so the example is easier to follow):

The problem I encountered is that Photon has an issue resizing transparent images, so for this I want to skip photon and do the resizing on the device. If you want to do that with the original code I see two options: duplicate it, or start adding options to the function to change its behavior. I’m not sure which option is worse.

The first step is to break the existing method into steps:

  1. “Photonize” the URL so we are asking for a resized image
  2. Build the request
  3. Fetch the data
  4. Turn it into an image
  5. Make sure it’s an image, or stop
  6. Set the image view’s image to that

So I’ve extracted most of those steps into functions and this is what I got:

This is much more functional, even if it looks sequential. It’s written that way because I find it’s easier to read, but you could write the same thing using nested functions.

If you’re into Lisp, you might find this version more pleasing, but in Swift this feels harder to follow than using intermediate variables. This is why other languages have things like the compose operator which you can use to compose functions without all that nesting. When I followed this route, I hit some limitations on Swift generics, and it also looks very foreign. This might work well in Haskell where all functions are curried, but it’s not so much in Swift.

Also note that the pattern of nesting breaks on fetchData as we don’t have a result yet. I would hardly call that callback hell, but we can start moving in that direction easily if we say we want all the image processing to happen in a background queue. Or if image resizing was an asynchronous operation.

Then I tried with RxSwift. All I wanted for this could be done with a much simpler Future type, since I don’t need side effects, cancellation, or any operator other than map/flatMap. But I already had RxSwift on the project so I’m using it as an example of the syntax. I also added a second version with the transform chain grouped into smaller pieces to improve readability.

What I haven’t tried yet is a solution based on NSOperation, but I have the feeling that it would add a lot of boilerplate, and wouldn’t feel completely right in Swift.

Finally, what I think I’d build is something based on the traditional version with customization points. I’d love to flatten the data pipeline and be able to just keep mapping over asynchronous values without depending on an external framework. For this example though, it seems that callbacks don’t complicate things much, at least yet. So I think I’ll start from something like this.

Here’s a gist with all the examples and the helper functions: ImageServices.swift

If you have a better design, I’d love to hear about it.