Monday, January 25, 2016

Why you should (sometimes) NOT use tail recursion in Scala

There was a recent post on /r/scala (direct article link) about how great tail recursion is. I agree with everything said in that article; this isn't an attempt to refute his points. But tail recursion has a dark side: it can be a huge hassle.

Non-tail recursive code has a very useful property: as each invocation on the stack completes, the previous invocation picks up exactly where it left off. So you can do some work, recurse, and then do more work when the recursion finishes. You can even recurse in a loop, meaning that the amount of work left to be done is dynamic and only known at runtime. In order to use tail-recursion, you have to restructure your code so that there is no work to be done after the recursive call returns. While it's always possible to restructure your code in this way, it can be a nontrivial transformation. In complex cases, you may even need to use your own stack to keep track of remaining work. Sure, it's not the call stack, so you don't need to worry about blowing up when your collection gets to be too large. But with that property comes a bit of complexity.


For example, I recently wrote an iterative implementation of Tarjan's topological sort of a directed graph. The overall algorithm is described well enough on Wikipedia. As you can see, the recursion occurs within a loop and with additional work to be done after all the recursion is complete.

Here's my Scala implementation. I won't promise that it's the best code, but it seems to work. I have more comments in the actual code, but I've stripped them here for brevity.

def topologicalSort[A](edgeMap: Map[A, Set[A]]): Seq[A] = {
  @tailrec
  def helper(unprocessed: Seq[Seq[A]], inProgress: Set[A], finished: Set[A], result: Seq[A]): Seq[A] = {
    unprocessed match {
      case (hd +: tl) +: rest => // [ [hd, ...], ... ]
        if (finished(hd)) {
          helper(tl +: rest, inProgress, finished, result)
        } else {
          if (inProgress(hd)) {
            throw new Exception("Graph contains a cycle")
          }
          val referencedVertices = edgeMap(hd)
          helper(referencedVertices.toSeq +: (hd +: tl) +: rest, inProgress + hd, finished, result)
        }
      case Nil +: (hd +: tl) +: rest => // [ [], [hd, ...], ... ]
        helper(tl +: rest, inProgress - hd, finished + hd, hd +: result)
      case Nil +: Nil => // [ [] ]
        result
    }
  }

  helper(edgeMap.keys.toSeq +: Nil, Set.empty, Set.empty, Nil)
}

(Assume that edgeMap contains one key for every vertex in the graph, even if the corresponding value is the empty set. This is an invariant that is enforced elsewhere.)

That unprocessed parameter is, essentially, the call stack. The outer Seq is used as a stack, while the inner Seq is used as a queue. Whenever we decide to visit a node, we push a new "frame" onto the front of that stack (and move the node into inProgress). Whenever the frame at the front of the stack is empty, it means that we have finished recursing children and can move the "current node" (encoded as the first element in the next frame) from inProgress to finished. And when we reach a state where the stack contains just one frame, and that frame is empty, we are done.

While this implementation won't blow the stack (at least, I don't think it will... it successfully sorted a graph with a path 10k vertices long), I wouldn't necessarily describe it as easy to understand. In my actual source code, the comments are nearly as long as the implementation. That in and of itself isn't a problem, but it's a shame that the source isn't more readable. (Though I'll freely admit that perhaps the lack of readability is my own fault.)

In fact, an astute reader might notice that the match expression is missing a case. What happens if the sequence looks like this:

[ [], [], ... ]

That is, why don't we have a pattern match clause that looks like this:

case Nil +: Nil +: rest => ???

This particular case can never occur. An invariant of this implementation is that, apart from the first queue in the stack, no other queue can be empty. Again, this fact is pointed out in a comment... a comment that isn't needed in the non-tail recursive version.


The author points out that tail recursion causes the compiler to rewrite your apparently recursive function as a loop. He also demonstrates a case where tail recursion shorter (and, I'd agree, more readable) than an explicit loop. But there are cases where the opposite is true. One that I've come across a few times is in what I will call "partitioning by type". Suppose you have a union type:

sealed trait Shape
case class Circle(c: Point, r: Float) extends Shape
case class Rectangle(lowerLeft: Point, w: Float, h: Float) extends Shape
case class Triangle(v1: Point, v2: Point, v3: Point) extends Shape
case class Polygon(vs: Seq[Point]) extends Shape

And suppose you want have a Seq[Shape]. But you would like to split it into independent lists: a Seq[Circle], a Seq[Rectangle], a Seq[Triangle], and a Seq[Polygon]. A tail recursive implementation might look like this:

type GroupShapesByTypeResult = (Seq[Circle], Seq[Rectangle], Seq[Triangle], Seq[Polygon])

def groupShapesByType(shapes: Seq[Shape]): GroupShapesByTypeResult = {
  @tailrec
  def helper(remaining: Seq[Shape], circles: Seq[Circle], rectangles: Seq[Rectangle], 
             triangles : Seq[Triangle], polygons : Seq[Polygon]): GroupShapesByTypeResult = {
    remaining match {
      case Nil =>
        (circles, rectangles, triangles, polygons)
      case (hd: Circle) +: rest =>
        helper(rest, circles :+ hd, rectangles, triangles, polygons)
      case (hd: Rectangle) +: rest =>
        helper(rest, circles, rectangles :+ hd, triangles, polygons)
      case (hd: Triangle) +: rest =>
        helper(rest, circles, rectangles, triangles :+ hd, polygons)
      case (hd: Polygon) +: rest =>
        helper(rest, circles, rectangles, triangles, polygons :+ hd)
    }
  }

  helper(shapes, Nil, Nil, Nil, Nil)
}

Not terribly readable. But wait. We don't need to manage recursion ourselves; we could just use a fold:

def groupShapesByType(shapes: Seq[Shape]): GroupShapesByTypeResult = {
  val init: GroupShapesByTypeResult = (Nil, Nil, Nil, Nil)

  shapes.foldLeft(init) {
    (acc, shape) =>
      val (circles, rectangles, triangles, polygons) = acc
      shape match {
        case c : Circle =>
          (circles :+ c, rectangles, triangles, polygons)
        case r : Rectangle =>
          (circles, rectangles :+ r, triangles, polygons)
        case t : Triangle =>
          (circles, rectangles, triangles :+ t, polygons)
        case p : Polygon =>
          (circles, rectangles, triangles, polygons :+ p)
      }
  }
}

We've traded explicit loop management for more complex destructuring. Arguably more readable, but still not great. OK, what if we abandoned this functional approach (Scala is multi-paradigm after all) and went with an explicit loop and mutability instead:

def groupShapesByType(shapes: Seq[Shape]): GroupShapesByTypeResult = {
  var circles : Seq[Circle] = Nil
  var rectangles : Seq[Rectangle] = Nil
  var triangles : Seq[Triangle] = Nil
  var polygons : Seq[Polygon] = Nil

  for (shape <- shapes) {
    shape match {
      case c : Circle => circles = circles :+ c
      case r : Rectangle => rectangles = rectangles :+ r
      case t : Triangle => triangles = triangles :+ t
      case p : Polygon => polygons = polygons :+ p
    }
  }

  (circles, rectangles, triangles, polygons)
}

Since the patterns and corresponding bodies are so much simpler, I took the liberty of combining them into single lines. I don't think it hurts readability. I don't know for sure, but I would even expect this implementation to run faster than either other implementation. The tail recursive version needs to successively chop the front off our list. And the version with foldLeft needs to constantly decompose and rebuild the loop state variable. This implementation just walks an iterable and updates the corresponding sequence. Persistent collections are awesome, folds are awesome, but walking an iterator and updating vars is hard to beat.


Again, I'm not trying to refute anything that the original post's author is saying. Tail recursion is great. But tail recursion comes with the cost of complexity. For situations with relatively shallow recursion trees (and with a clear upper bound to the recursion depth), I'm actually OK with non-tail recursion. For example, using non-tail recursion to traverse an XML document that is known to be fairly flat is perfectly fine. It might even be fine for traversing a parsed AST for a programming language. Sure, most programming languages allow expressions to be nested to arbitrary depths, but most code written by reasonable humans has a soft upper bound on how deeply those expressions are nested.

If you can naturally express your algorithm with tail recursion, go for it! But if it's unnatural, consider whether tail recursion is actually needed.

Friday, September 04, 2015

How JetBrains Lost Years of Customer Loyalty in Just a Few Hours

NOTE: This post was originally written after JetBrains announced a controversial new licensing model. Many people spoke out about it. The next day, they followed up to say that they were listening to the feedback, and two weeks later made a final post with significant refinements from their original announcement. But the content in this post predated either of those follow-ups. Keep that in mind when reading.


Yesterday's big news, at least for many developers, is that JetBrains - maker of popular tools like IntelliJ and ReSharper - is moving to a software-as-a-service subscription model for their products.

Previously, buying a JetBrains product got you a perpetual license and a year of upgrades. Once the license expired, any software you had received under that license would continue to work, but you would need to buy another license to get further upgrades. It was a simple model that worked just fine for many people, and most customers upgraded every year.

Starting November 2, though, that all stops. After that date, JetBrains will no longer sell these perpetual licenses. Instead, you can rent access to their software on a month-by-month basis.

And there was much raging.

Now, don't get me wrong. A subscription model has apparently been a common request, and some of the feedback to the announcement has been positive. This arrangement is especially good for consulting shops that do one project in C# and the next project in Java. Rather than committing to a year of use, they can choose to only pay for what they need in any given month.

But that's just one type of customer. There are also plenty of single-platform shops. I know a lot of people who just need ReSharper or IntelliJ. These customers will probably notice no big difference - they will renew for a year at a time, and probably get a small discount as well.

This all sounds great! What's the problem?

The first change, and probably the biggest, is that the software will apparently stop working when you stop paying for your subscription. That's probably going to impact indie developers the most. For a developer with an unstable income, it might be perfectly fine to stay on an older version of the software until they've stashed enough cash to afford the upgrade. That will no longer work. But it's not just indie developers. I've seen companies who forget to renew their licenses promptly or who have long and convoluted processes to approve the expenditure. I guess, under the new model, development grinds to a halt until the purchase goes through.

Another controversial aspect is that the software will need to phone home. Now, JetBrains has given a gracious window - the software only has to dial the mothership once every 30 days. And customers in an internet-restricted environment will be able to install a license server inside their network to manage the license pool. This is not an uncommon practice for enterprise or specialized software. But it does create an interesting challenge. The licensing FAQ indicates that it's allowed for an employee to use their personal license at work; I've often taken advantage of this. But it doesn't look like the JetBrains license server supports personal licenses. For people in an internet-restricted environment, it looks like this perk is no longer available.

OK, so users lose some ability that they previously had, but the software is cheaper, right? Customers win a little and lose a little, so maybe it's a wash. Yeah, the software is cheaper... sort of. My last IntelliJ upgrade was $99 for the year. Under the new model, I'll only pay $89 for a year. Huzzah! Well, that's only applicable for users who already own IntelliJ. New users will pay $119 per year, which is a lot less than the old, introductory price of $199. But here's the deal: if I ever let my subscription lapse, it looks like I end up losing my grandfathered discount. And even then, the prices given are all listed as promotional prices that are only good until Jan 31, 2016. Is this a sign that the prices will jump in the near future? JetBrains certainly tried to promote this new licensing model by saying that it would make their software more affordable. It does make it cheaper, especially for new users (which is great!), but the situation for existing users is a little more murky. It's only cheaper for me if I keep renewing promptly. If I ever miss a renewal, my yearly costs jump by 30%.

But none of those details really explain why the internet got so upset. I think JetBrains miscalculated just how much people like the current licensing model. Sure, offering a subscription-oriented model makes sense for some kinds of customers. But there are many other customers for whom a subscription model is going to be worse. JetBrains indicated that this change is being made primarily to provide a better service to their customers. The feedback that they got today is that many customers don't see the new scheme as an improvement. Now, JetBrains has said (update 3) that they would take the feedback under consideration, which is definitely a good sign.

It's always awkward when a company says "this is good for our customers" and the customers respond with "no it's not". We saw this a few years ago with Adobe. In that case, it was completely clear that they didn't care what their customers wanted. They had decided on a course of action and nothing could stop that train. But I don't think anybody was surprised to see Adobe go in that direction. People liked Adobe's products, but I don't know that anybody really liked Adobe as a company. JetBrains was different. They built a loyal customer base on quality software and reasonable policies. JetBrains products had become the examples people used when saying "you know, open-source is great and all, but I'm happy to pay for quality software". When I read some of the responses to yesterday's announcement, I get the impression that existing customers feel a sense of betrayal. They're confronted with the idea that maybe JetBrains is no different from Adobe. Maybe all the goodwill that they felt for this company was misplaced.

Ultimately, JetBrains's response to this kerfluffle will show the underlying motivation behind this change. Will they listen to the feedback and truly offer licensing options that keep everybody happy? Or will they double-down on the software-as-a-service model, in the hopes that the controversy will just blow over?

Of course, listing the problems isn't super useful. If anybody from JetBrains reads this, I do have some suggestions for what you could do to appease the crowds:

  • Continue to offer perpetual licenses. I don't think people are bothered by you offering subscription licensing; indeed, some customers seem to prefer it. But for customers who are happy with the status quo, forcing them to switch and threatening them with software that could suddenly stop working, it's a really hard pill to swallow.
  • Or... require that corporate licenses be subscription-based, but continue to offer perpetual, personal licenses. I'm guessing that most of the people upset with this change are people who are currently using personal licenses. These are probably your most loyal, and also most vocal, customers. These are the kinds of people that get your products into an enterprise environment. At least keep them happy.
  • Take another look at your pricing. You're asking users to replace perfectly functional software with software with a coin slot; if you stop feeding money into the meter, the software stops working. You have to give those users something in return. If you did something drastic - like cutting those prices in half - people might be far more willing to accept this software-as-a-service model.
  • Offering lower introductory prices is great! But you don't need to fundamentally change your pricing model to do that. You've offered sales before - I got my initial ReSharper and IntelliJ licenses during your end-of-the-world sale back in 2012. If you want to attract new users, you could just, you know, lower your buy-in price. Heck, you could even raise your renewal prices by 10%. I suspect that such a change wouldn't have even raised an eyebrow.
  • (Late edit after reading more comments) As a reward for subscribing for a year or more at a time, issue perpetual licenses for products released during that time. If I subscribe for a month and then let my subscription lapse, my software stops working. But if I subscribe for a year and THEN let my subscription lapse, any software released during the window continues to work. This creates a situation where JetBrains keeps making money, but customers aren't punished for letting their subscription lapse.

I was a huge Eclipse fan back in 2010, but a friend convinced me to switch to IntelliJ and I've been a loyal user since. I'm not writing as an outside observer, but as a concerned customer. Now, JetBrains doesn't really care about my business; I'm guessing that I pay for something like 4 of their developer hours per year. But people like my friend, and now me, are vital to JetBrains growing their business. I pushed and pushed to get ReSharper installed on all my coworker's machines; that ended up being something like 10 corporate licenses, which pays for a lot more development time. JetBrains got to where they are today by building a very loyal fanbase. I hope they realize that alienating that fanbase could tear them back down.

Thursday, June 18, 2015

A quick overview of what WebAssembly is and what it is not (yet)

There's been a lot of buzz today about WebAssembly, and that's completely understandable. A bytecode form of Javascript has been on the minds of many web developers for a long time. But the online commentary seems to have been based more on hopes than on released information. I hope to clear up some of that confusion. Note that I'm not involved in the project at all, so some of this information is likely incorrect; feel free to leave a comment if I've gotten something wrong.

WebAssembly isn't a spec yet. It's not even a draft spec. It's an idea and a proof-of-concept. So far, that's all that's in the public. But it has a lot of promise.

The WebAssembly roadmap essentially spells out three phases:

  1. A minimum viable product (MVP) that is roughly analogous to asm.js, specifically targeting C/C++ as the source language
  2. Additional features, such as threads, SIMD, and proper exception handling, all still with a focus on C/C++
  3. "Everything else", including features meant to support more languages. One such feature is access from WebAssembly code to objects on the garbage-collected Javascript heap. This could enable WebAssembly code to access the DOM and web APIs (which would not be supported in either of the previous iterations). This phase also contains support for things like large heaps, coroutines, tail-call optimization, mmapping files, etc. If and when we get to this phase, I would expect it to get further split.

There will also be a polyfill, since browsers will not initially support WebAssembly. In fact, there is already a polyfill, but it's more of a proof-of-concept than an actual implementation. There is no WebAssembly spec yet; this polyfill is merely to test the viability of a binary-encoded AST.

Initially, it's not wrong to think of WebAssembly as binary-encoded asm.js, though that will change over time. It may eventually grow to the point that it could replace Javascript, but in its first two incarnations, you will still need some JS code (to interact with the DOM and with web APIs). A likely use case is to compile low-level, algorithmic code (think image processing, compression, or encryption code) to WebAssembly, but to still write the bulk of your application in plain JS. If you're not familiar with asm.js, it might make sense to look into it; many of the same limitations of that environment also appear to apply to the first incarnation of WebAssembly.

However, even at this early stage, WebAssembly does specify some features that even asm.js doesn't provide. In particular, it talks about 64-bit integer operations, which asm.js can't natively provide. The current polyfill POC doesn't seem to support them, and they may choose for the polyfill to always implement them as floating-point operations, but an actual runtime would need to provide proper support for int64 (and other sizes as well, like int8 and int16).

WebAssembly only deals with the binary format and runtime environment; it doesn't provide any new APIs for dealing with the DOM or the network or anything like that. It may eventually provide alternatives to other web APIs (WebWorkers would be an obvious one, when WebAssembly eventually adds threading support).

The current plan for WebAssembly is to not use bytecode in the same way as JVM or CLR use bytecode. Those VMs represent their bytecode as an instruction stream, similar to instructions for non-virtual machines. WebAssembly, on the other hand, appears to be going more for a "serialized abstract syntax tree" form. The claim seems to be that this can be compressed more efficiently than can be achieved using general-purpose compression routines, and that doesn't seem too crazy. This distinction isn't important for web app developers, though it could mean that "disassembled" WebAssembly is easier to grok than disassembled JVM bytecode.

WebAssembly will have defined binary AND text forms. So, while you won't be able to curl a WebAssembly file and immediately understand it, there will be tooling to convert from the binary form to the text form (which will certainly eventually be built into browsers). It seems likely that the bytecode semantics will make concessions to retain some degree of human readability when converted to text form.

I really like this quote from Peter Kasting in the comments on Ars. Like, seriously, if you have an Ars account, go give him some fake internet points.

Notably, the people working on WebAssembly are the PNaCl (Google) and asm.js (Mozilla) teams. In some sense this can be considered the followon effort to those projects, meant to combine the best attributes from each, and in a way that can be agreed on by all browser vendors.

This is exactly how the system is supposed to work: individual teams try to advance the state of the art, and eventually all those lessons learned are incorporated into a new and better system. See e.g. SPDY -> HTTP2. WebAssembly draws on both the past work and the experience of all those involved, and wouldn't be what it is without them.

I say this partly so the sorts of people who have bemoaned "non-standard" vendor efforts in the past may have reason to pause next time they feel the urge to do so. No one wants a balkanized world forever, but that vendor-specific effort may gain the sort of real-world experience necessary to come back and design a great cross-vendor solution afterwards. All four major browser vendors have taken flak for this sort of thing in the past, and in my mind often unfairly so.

All told, it looks like this is just the first, small step. I'm a little surprised that they went public so early. Either they really plan to develop it in the open, or the tech media got wind of it and blew it well out of proportion. Whatever the case, this looks like it will be a large undertaking. It's very promising that everybody seems to be at the table. And given that the browser vendors appear to want to actively develop their products, we might actually be able to use this within a couple of years.

What a time to be alive!

Tuesday, June 16, 2015

The joys of parsing a toy programming language

In my personal time, I've been playing at building a toy programming language. Progress has been slow, but things are finally starting to come together. Last night, I ended up finding an interesting bug, and I wanted to document it here.

The syntax of my language isn't super complicated. Here's an example of a function:

def foo(a, b) = a + b

Because this language is a so-called modern language, it has to support lambdas as well. Here's an example:

val bar = (a, b) => a + b

You can call functions and lambdas using the same syntax:

foo(5, 7)
bar(5, 7)

Naturally, since lambdas are first-class, we need to support more complicated call syntax as well. In particular, this should be allowed:

((a, b) => a + b)(5, 7)

In order to support that, here's the section of the grammar that describes function calls (you can imagine the rest of the grammer):

FnCall    :=  Callable lparen Args rParen
Callable  :=  identifier
          :=  lParen Expression rParen
Args      :=  identifier MoreArgs
          :=  epsilon
MoreArgs  :=  comma identifier MoreArgs
          :=  epsilon

My compiler was able to successfully compile and run this extremely simple program:

val bar = (a, b) => a + b
bar(5, 7)

But imagine my surprise when it failed to compile this program:

val bar = (a, b) => a + b
(bar)(5, 7)

Looking at the parse tree, it was obvious what had happened. It turns out that my grammar was ambiguous, and the parser had chosen this interpretation:

val bar = (a, b) => a + (b(bar))(5, 7)

Rather than parsing this as a definition and a corresponding expression, it instead parsed it as a single definition. It assumed that b was a function of one parameter, which returned a function of two parameters. Even this simpler grammar has the same problem:

FnCall    :=  identifier lparen Args rParen
Args      :=  identifier MoreArgs
          :=  epsilon
MoreArgs  :=  comma identifier MoreArgs
          :=  epsilon

Given this program:

val bar = (a, b) => a + b
(5 + 7) * 2

This could (and would) get parsed as:

val bar = (a, b) => a + b(5 + 7) * 2

(It's worth noting that, if I had been using a shift / reduce parser, this would have been detected as a S/R conflict. But I'm not using a S/R parser; I'm using the Scala parser combinator library, mostly because it was easy to get started and I haven't outgrown it yet.)

Looking at Scala, from which I stole a lot of my syntax, I found that newline handling is a bit complicated, though the rules are laid out plainly in section 1.2 of the Scala language spec. Essentially, a newline can be treated either as plain whitespace or as a statement terminator, depending on the context. Within a parenthesized expression, newlines are always treated as whitespace. Outside an expression, newlines are treated as statement terminators if they appear between a token that could end a statement and a token that could begin a statement. One one hand, having written a fair amount of Scala, the rules feel pretty natural and intuitive. However, you do end up with strange behavior at the edges, as the following example demonstrates:

(succ
  (5))   => 6

{succ
  (5)}   => 5

On the other hand, Haskell (with which I'm admittedly not very familiar) appears to use indentation level to determine expression grouping. That is, while this is valid:

main = putStrLn "hi"

and this is also valid:

main = putStrLn 
 "hi"

this is not:

main = putStrLn 
"hi"

Context is not considered, so unlike Scala, this still fails:

main = (putStrLn 
"hi")

Both of these approaches have merit. Haskell's approach is natural enough, though there are some unambiguous representations that it would reject. Scala's approach is more permissive, at the cost of more complicated implementation rules and more surprising behavior. I can't tell which is more appropriate for my language.

For now, I took neither approach. I realized that, as far as I know, I have but one case where expressions are ambiguous - when the parameters to a function appear on a separate line from the Callable itself. It was possible for me to simply require that the whitespace between the Callable and its parameters not include any newlines. So an identifier at the end of a line will NEVER be considered to be a function call. I don't think this will remain this way forever, but I think it will get me unstuck.

Monday, March 04, 2013

Updating a System Shock 2 Wallpaper for HD Resolutions (in Javascript!)

Back when I was in college, I was a big System Shock 2 fan. My favorite co-op experience of all time was when my dorm roommate and I played SS2 together. I had all kinds of ideas for case mods (even though I had neither the money nor the tools to make it happen). Ultimately, my only creative contribution to the world of Shock was to combine two wallpapers that were floating around the net into one of my very own:

Yeah, I was pretty pleased with myself back in the day. In any case, I wanted to commemorate the recent GoG re-release of SS2 by inviting Shodan to adorn my desktop yet again. Unfortunately, screen resolutions have increased quite a bit in the intervening years, and a pixelated Shodan simply won't do. Fortunately, in the GoG re-release, they included a ludicrously high resolution 5100x3338 pixel render. All I need to do is to scale that down, generate the ASCII half, blend them, and Bob's your uncle.

I'm sure that there are a lot of image to ASCII generators out there, but I can't shy away from a chance to learn something, so I decided to try to write my own. That's not even the interesting part of the story. Because I'm a masochist, I decided to do it with HTML and Javascript. I figured that, between the drag-and-drop API, canvas, and a high-performance JS engine like V8, I could probably get away with it.

Many hours later, I have something that basically works. I'll probably clean it up and get it posted to GitHub. It wasn't too hard to allow dropping an image file onto the page. I end up doing a lot of work against a scratch canvas before finally dumping the output into an image element using the HTMLCanvasElement toDataUrl method. This is great; I can then drag the image off the page and onto my desktop (something that the canvas element doesn't automatically do). Even though the data URL is ridiculously long, it correctly displays on the screen. However, when I was working with the original 17 megapixel image, I found that dragging the output image out of my browser would immediately crash the Chrome tab. Fortunately, Chrome has no problems with the image at my target resolution (1920x1080).

Because this extremely long data URL feels pretty sketchy, I looked to see if there was a way around it. I would love to output the results to a canvas element instead of an image. All I need is to use the DnD API to make the canvas a valid drag source. Of course, in order to do that, I need to be able to generate the PNG bytestream, as well as synthesize a File object in the browser. While it's definitely possible to build a pure-JS PNG encoder, I don't see any way to synthesize a File object. Although the DnD spec specifically asks for a File, maybe it would be happy with an arbitrary Blob instead; I don't know, and I haven't yet tried. If the spec doesn't support this use case, it's a shame; I can think of a number of cases where it would be neat to generate a file from client-side JS.

I like to think that my skills of an artist have improved in the intervening years as well. A little stylistic shading, and here is the result.

I used a different technique to generate the digital side (the original used 0s and 1s and modulated the intensity on a pixel-by-pixel basis; I achieve my shading by choosing from a larger palette of characters). Still, I feel like the end result has the same tone as the original. And just like last time, I'm pretty pleased with myself. Let me know what you think!

Edit: The code is available on GitHub. You can try it out on my site.

Saturday, January 14, 2012

Using Apache on Mac OS X to serve files outside ~/Sites

I'm working on a web project that basically contains just static HTML and Javascript. (well OK, there's also one, small PHP script, but it might be going away in the near future). I tend to keep all my source code in ~/src, but to host it, I also need it to appear in ~/Sites. After some small trial-and-error, I ended up putting the everything (git repo and all) into ~/Sites, and then symlinked it to ~/src. It wasn't pretty, but it worked.

So I just did some reorganization that pretty much invalidated that old structure. In particular, I have moved everything that needs to be deployed into ~/src/project/web. However, I want it to be accessible via http://localhost/me/project. I tried physically moving the project back into ~/src, and then making a symlink to the subdirectory, but that didn't work. Apache would still produce 403s for all the relevant files. So I had to roll up my sleeves and dive into Apache configuration.

Before I go further, I'm compelled to pull out the old soapbox. I have painfully little experience with Apache - I have never had to configure or support it in a production environment, and that makes me happy. From this position of ignorance, I have decided that Apache is a dinosaur that should have died a long time ago. For example, instead of configuring the server from the request's point of view (as has been popularized by Rails' routing logic), it is configured from the filesystem's point of view. The default Mac OS configuration has, buried somewhere in the middle of the file, a directive that disallows the serving of all files under /. Because, I guess, they would be served by default if that directive wasn't present? But still, nobody seems to want to spend the time to produce a replacement web server, and so we struggle on. </rant>

The default Apache install on MacOS 10.7 uses a split apache configuration file. The bulk of the configuration is in /etc/apache2/httpd.conf. However, each user also gets their own /etc/apache2/users/me.conf, which are all imported into the main configuration. And while the main configuration file specifies the FollowSymlinks option, I discovered that the same is not true in my personal config file. All I had to do was to add the FollowSymlinks option to that configuration file, restart Apache, and everything started working.

So if you have only basic web serving needs, the default config should suffice. If, however, you want/need to spread the files around your disk, you need to mess with the Apache configuration.

Monday, September 05, 2011

Mysterious, Blank User in 10.7 Sharing Dialog

I wanted to copy some files from my PC to my Mac. When I went to turn on SMB sharing, I came across this:

I was wondering about the identity of this phantom user. It turns out that it is the macports user. He doesn't show up on the login screen. He never showed up under 10.6. Apparently, Apple changed something about the way users are reported to applications.

If it bothers you, you can fix it with dscl.

sudo dscl . -create /Users/macports RealName macports

The first parameter is the machine you want to administer; . is apparently a shortcut for localhost. Then we give the command - we want to create a new key. Then we specify where this key should be created - in this case, the macports user's Directory Services path. Next is the name of the key - RealName is what appears to be used by the sharing dialog. (RealName is also assigned on users you create through System Preferences). Finally, we provide a value for this user's name. Now, we have this: