Diving into Swift compiler performance

It all starts by reading this week in Swift, and the article The best hardware to build with Swift is not what you might think, written by the LinkedIn team about how apparently their Mac Pros are slower at building Swift than any other Mac.

I’ve spent so much time waiting for Xcode to compile over the past years than I’ve often toyed with the idea of getting an iMac or even a Mac Pro for the maximum possible performance, so this caught my attention. I’ve also been wondering if instead of throwing money at the problem, there might be some easy improvements to either reduce build time or to improve Swift performance.

Looking at the reported issue I discovered a couple Swift compiler flags that were new to me: -driver-time-compilation and -Xfrontend -debug-time-compilation, which will show something like this:

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
  Total Execution Time: 10.1296 seconds (10.6736 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   3.9556 ( 99.9%)   6.1701 (100.0%)  10.1257 (100.0%)  10.6697 (100.0%)  Type checking / Semantic analysis
   0.0013 (  0.0%)   0.0002 (  0.0%)   0.0015 (  0.0%)   0.0015 (  0.0%)  LLVM output
   0.0011 (  0.0%)   0.0001 (  0.0%)   0.0013 (  0.0%)   0.0013 (  0.0%)  SILGen
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  IRGen
   0.0003 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)  LLVM optimization
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Parsing
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  SIL optimization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Name binding
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  AST verification
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  SIL verification (pre-optimization)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  SIL verification (post-optimization)
   3.9589 (100.0%)   6.1707 (100.0%)  10.1296 (100.0%)  10.6736 (100.0%)  Total

I started looking into the results for the WordPress app and it looks like almost every bottleneck is in the Type Checking stage. As I’d find later, mostly in type inference. I’ve only tested this on Debug builds, as that’s where improvements could impact development best. For Release, I’d suspect the optimization stage would take a noticeable amount of time.

Time to look at what’s the slowest thing, and why. Disclaimer: the next shell commands might look extra complicated. I’ve used those tools for over a decade but never managed to fully learn all their power, so I’d jump from grep to awk to sed to cut and back, just because I know how to do it that way. I’m sure there’s a better way, but this got me the results I wanted so ¯\(ツ)/¯.

Before you run these close Xcode, and really everything else if you can so you get more reliable results.

Do a clean build with all the debug flags and save the log. That way you can query it later without having to do another build.

xcodebuild -destination 'platform=iOS Simulator,name=iPhone 7' \
  -sdk iphonesimulator -workspace WordPress.xcworkspace \
  -scheme WordPress -configuration Debug \
  clean build \
  OTHER_SWIFT_FLAGS="-driver-time-compilation \
    -Xfrontend -debug-time-function-bodies \
    -Xfrontend -debug-time-compilation" |
tee profile.log

Print the compiled files sorted by build time:

awk '/Driver Time Compilation/,/Total$/ { print }' profile.log |
  grep compile |
  cut -c 55- |
  sed -e 's/^ *//;s/ (.*%)  compile / /;s/ [^ ]*Bridging-Header.h$//' |
  sed -e "s|$(pwd)/||" |
  sort -rn |
  tee slowest.log

Show the top 10 slowest files:

head -10 slowest.log
2.9555 WordPress/Classes/Extensions/Math.swift
2.8760 WordPress/Classes/Utility/PushAuthenticationManager.swift
2.8751 WordPress/Classes/ViewRelated/Post/AztecPostViewController.swift
2.8748 WordPress/Classes/ViewRelated/People/InvitePersonViewController.swift
2.8741 WordPress/Classes/ViewRelated/System/PagedViewController.swift
2.8699 WordPress/Classes/ViewRelated/Views/WPRichText/WPTextAttachmentManager.swift
2.8680 WordPress/Classes/ViewRelated/Views/PaddedLabel.swift
2.8678 WordPress/Classes/ViewRelated/NUX/WPStyleGuide+NUX.swift
2.8666 WordPress/Classes/Networking/Remote Objects/RemoteSharingButton.swift
2.8162 Pods/Gridicons/Gridicons/Gridicons/GridiconsGenerated.swift

Almost 3 seconds on Math.swift? That doesn’t make any sense. Thanks to the -debug-time-function-bodies flag, I can look into profile.log and see it’s all the round function. To make this easier, and since it doesn’t depend on anything else in the app, I extracted that to a separate file. In this case, the -Xfrontend -debug-time-expression-type-checking flag helped identifying the line where the compiler was spending all the time:

return self + sign * (half - (abs(self) + half) % divisor)

When you look at it, it seems pretty obvious that those are all Ints, right? But what’s obvious to humans, might not be to a compiler. I tried another flag -Xfrontend -debug-constraints which resulted in a 53MB log file 😱. But trying to make sense of it, it became apparent that abs was generic, so the compiler had to guess, and +,-,*, and % had also multiple candidates each, so the type checker seems to go through every combination rating them, before picking a winner. There is some good information on how the type checker works in the Swift repo, but I still have to read that completely.

A simple change (adding as Int) turns the 3 seconds into milliseconds:

return self + sign * (half - (abs(self) as Int + half) % divisor)

I’ve kept going through the list and in many cases I still can’t figure out what is slow, but there were some quick wins there. After 4 simple changes, build time was reduced by 18 seconds, a 12% reduction.

Composing operations in Swift

Continuing on From traditional to reactive, the problem I’m solving today is refactoring our image downloading system(s).

If I remember it correctly, it all started a long time ago when we switched to AFNetworking, and started using its UIImageView.setImageWithURL methods. To this day I still feel that there’s something terribly wrong in a view calling networking code directly, but that’s not today’s problem. Instead it’s how this system grew by adapting to each use case with ad-hoc solutions.

If I haven’t missed anything, we have:

  • AFNetworking’s setImageWIthURL: you pass a URL and maybe a placeholder, and it eventually sets the image. It uses its own caching.
  • UIImageView.downloadImage(_:placeholderImage:), which is basically a wrapper for AFNetworking’s method which sets a few extra headers. Although it’s not very obvious why it’s even there.
  • A newer UIImageView.downloadImage(_), which skips AFNetworking and uses NSURLSession directly. This was recently developed for the sharing extension to remove the AFNetworking dependency. It creates the request, does the networking, sets the result, and it adds caching. It also cancels any previous requests if you reset the image view’s URL.
  • Then we have downloadGravatar and downloadBlavatar which are just wrappers that take an email/hash or hostname and download the right Gravatar. At least these were refactored not long ago to move the URL generation into a separate Gravatar type. Although it seems the old methods are also still there and used in a few places.
  • WPImageSource: basically all it does is prevent duplicate requests. Imagine you just loaded a table to display some comments and you need to load the gravatars. Several of the rows are for the same commenter, so the naive approach would request the same gravatar URL several times. This coalesces all of the requests into one. This doesn’t really have anything to do with images in theory, and could be more generic, although it also handles adding authentication headers for private blogs.
  • WPTableImageSource: this one does quite a few things. The reason behind it was to load images for table view cells without sacrificing scrolling performance. So it uses Photon to download images with the size we need. If Photon doesn’t give us what we want (if the original image is smaller than the target size, we don’t want to waste bandwidth) we resize locally. We cache both original and resized images. It also keeps track of the index path that corresponds to an image so we can call the delegate when it’s ready, and it supports invalidating the stored index paths when the table view contents have changed, so we don’t set the images in the wrong cell.

These are all variations of the same problem, built in slightly different ways. What I want to do is take all of this into composable units, each with a single responsibility.

So I built an example for one of the simple cases: implement a UIImageView.setImageWithURL(_) that uses Photon to resize the image to the image view’s size, downloads the image and sets it. No caching, placeholders or error handling, but it customizes the request a bit.

This is what the original code looked like approximately (changed a bit so the example is easier to follow):

func old_setResizedImage(withURL url: NSURL) {
let photonUrl = photonize(url: url, forSize: frame.size)
let request = NSMutableURLRequest(URL: photonUrl)
request.HTTPShouldHandleCookies = false
request.addValue("image/*", forHTTPHeaderField: "Accept")
let session = NSURLSession.sharedSession()
let task = session.dataTaskWithRequest(request) { [weak self] data, response, error in
let scale = UIScreen.mainScreen().scale
guard let data = data,
let image = UIImage(data: data, scale: scale) else {
return
}
self?.image = image
}
task.resume()
}

view raw
old.swift
hosted with ❤ by GitHub

The problem I encountered is that Photon has an issue resizing transparent images, so for this I want to skip photon and do the resizing on the device. If you want to do that with the original code I see two options: duplicate it, or start adding options to the function to change its behavior. I’m not sure which option is worse.

The first step is to break the existing method into steps:

  1. “Photonize” the URL so we are asking for a resized image
  2. Build the request
  3. Fetch the data
  4. Turn it into an image
  5. Make sure it’s an image, or stop
  6. Set the image view’s image to that

So I’ve extracted most of those steps into functions and this is what I got:

func traditional_setResizedImage(withURL url: NSURL) {
let resizedURL = photonize(url: url, forSize: frame.size)
let request = requestForImageUrl(resizedURL)
fetchData(request) { data in
guard let image = imageFromResponseData(data) else {
return
}
dispatch_async(dispatch_get_main_queue(), {
applyImage(image, to: self)
})
}
}

view raw
traditional.swift
hosted with ❤ by GitHub

This is much more functional, even if it looks sequential. It’s written that way because I find it’s easier to read, but you could write the same thing using nested functions.

func functional_setResizedImage(withURL url: NSURL) {
fetchData(
requestForImageUrl(
photonize(url: url, forSize: frame.size)
)
) { data in
unwrapOptional(
imageFromResponseData(data),
then: applyImageInMainQueue(to: self)
)
}
}

view raw
functional.swift
hosted with ❤ by GitHub

If you’re into Lisp, you might find this version more pleasing, but in Swift this feels harder to follow than using intermediate variables. This is why other languages have things like the compose operator which you can use to compose functions without all that nesting. When I followed this route, I hit some limitations on Swift generics, and it also looks very foreign. This might work well in Haskell where all functions are curried, but it’s not so much in Swift.

infix operator { associativity right precedence 140 }
func <T, U, V>(g: U -> V, f: T -> U) -> T -> V {
return { t in
return g(f(t))
}
}
func composed_setResizedImage(withURL url: NSURL) {
let fetchResizedUrl = fetchData requestForImageUrl photonize(forSize: frame.size)
fetchResizedUrl(url)({ data in
unwrapOptional(
imageFromResponseData(data),
then: applyImageInMainQueue(to: self)
)
})
}

view raw
composed.swift
hosted with ❤ by GitHub

Also note that the pattern of nesting breaks on fetchData as we don’t have a result yet. I would hardly call that callback hell, but we can start moving in that direction easily if we say we want all the image processing to happen in a background queue. Or if image resizing was an asynchronous operation.

func concurrent_setResizedImage(withURL url: NSURL) {
let backgroundQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)
let mainQueue = dispatch_get_main_queue()
dispatch_async(backgroundQueue) { [frame] in
let resizedURL = photonize(url: url, forSize: frame.size)
let request = requestForImageUrl(resizedURL)
fetchData(request) { data in
dispatch_async(backgroundQueue, {
guard let image = imageFromResponseData(data) else {
return
}
dispatch_async(mainQueue, {
applyImage(image, to: self)
})
})
}
}
}

view raw
concurrent.swift
hosted with ❤ by GitHub

Then I tried with RxSwift. All I wanted for this could be done with a much simpler Future type, since I don’t need side effects, cancellation, or any operator other than map/flatMap. But I already had RxSwift on the project so I’m using it as an example of the syntax. I also added a second version with the transform chain grouped into smaller pieces to improve readability.

func rx_setResizedImage(withURL url: NSURL) -> Disposable {
let backgroundScheduler = ConcurrentDispatchQueueScheduler(globalConcurrentQueueQOS: .Default)
let mainScheduler = MainScheduler.instance
return Observable
.just(url)
.observeOn(backgroundScheduler)
.map(photonize(forSize: frame.size))
.map(requestForImageUrl)
.flatMap(rx_fetchData)
.observeOn(backgroundScheduler)
.map(imageFromResponseData)
.flatMap(unwrapOptional(ImageDownloadError.NilImage))
.observeOn(mainScheduler)
.subscribeNext(applyImage(to: self))
}
func rx2_setResizedImage(withURL url: NSURL) -> Disposable {
let backgroundScheduler = ConcurrentDispatchQueueScheduler(globalConcurrentQueueQOS: .Default)
let mainScheduler = MainScheduler.instance
let request = Observable
.just(url)
.observeOn(backgroundScheduler)
.map(photonize(forSize: frame.size))
.map(requestForImageUrl)
let image = request
.flatMap(rx_fetchData)
.observeOn(backgroundScheduler)
.map(imageFromResponseData)
.flatMap(unwrapOptional(ImageDownloadError.NilImage))
return image
.observeOn(mainScheduler)
.subscribeNext(applyImage(to: self))
}

view raw
rx.swift
hosted with ❤ by GitHub

What I haven’t tried yet is a solution based on NSOperation, but I have the feeling that it would add a lot of boilerplate, and wouldn’t feel completely right in Swift.

Finally, what I think I’d build is something based on the traditional version with customization points. I’d love to flatten the data pipeline and be able to just keep mapping over asynchronous values without depending on an external framework. For this example though, it seems that callbacks don’t complicate things much, at least yet. So I think I’ll start from something like this.

func identity<T>(value: T) -> T {
return value
}
extension UIImageView {
func composable_setResizedImage(processUrl proccessUrl: (NSURL -> NSURL) = identity, processImage: (UIImage -> UIImage) = identity) -> NSURL -> Void {
return { url in
let processedURL = proccessUrl(url)
let request = requestForImageUrl(processedURL)
fetchData(request) { data in
guard let image = imageFromResponseData(data) else {
return
}
let processedImage = processImage(image)
dispatch_async(dispatch_get_main_queue(), {
applyImage(processedImage, to: self)
})
}
}
}
func downloadAndResizeLocally(url: NSURL) {
let download = composable_setResizedImage(processImage: resizeImage(to: frame.size))
download(url)
}
}

view raw
composable.swift
hosted with ❤ by GitHub

Here’s a gist with all the examples and the helper functions: ImageServices.swift

If you have a better design, I’d love to hear about it.