Graphical User Interfaces: First Principles

How exactly does a computer compute a vector-based graphical user interface into existence? How do vivid visual metaphors arise algorithmically from the first principles of logic and arithmetic? This (incomplete) article is my first attempt at a coherent answer to these questions in a way as concise as possible.

Event Loop

These two steps are are interconnected like yin and yang — picking an event handling object selects a code path for program state updates and affects the geometries output by draw(). The updated arrangement of these geometries, in turn, dictates the behavior of pick(event.xy) in the next iteration.

Closed Curves

At the core of 2D vector graphics is the mathematical concept of "the interior of closed curves", e.g. circles and rectangles. We've all learned how to fill the interior of a rectangle with a nested loop:

This idea can be generalized to all closed curves. The only difference is that the horizontal scan line at each $y$ could potentially enter and leave the interior of the curve multiple times, resulting in multiple intervals to be filled:

We can implement pick(event.xy) using the same logic: If a point at $(x,y)$ lies in the interior of a curve, $x$ must be in the middle of one of the horizontal intervals at $y$, so there must be an odd number of intersections on either side in the $x$ direction. This hit-detection is known as the "even-odd rule". We could also apply the "non-zero winding rule", which I will not get into here.

Polygons

How do we find curve.intervals(y) for an arbitrary parameterized closed curve? To answer this question, let's first look at a simple case: polygons. For polygons, our problem becomes much more tractable: We just need to find all line segments intersecting the scan line.

Given a line segment between $(x_1,y_1)$ and $(x_2, y_2)$, where $y_1 < y_2$, we know a scan line at $y$ can only cross the line segment if $y_1 \leq y < y_2$, in which case $$k=\frac{x_2-x_1}{y_2-y_1}, \quad intersection=\left(x_1 + k(y-y_1), y\right).$$

This canvas implements the rasterization algorithm described above.

Click on the canvas to place your vertices. You'll need at least 3.

Transformations

To move and morph our polygons, we need to apply linear transformations. Since a polygon is defined by its vertices, we just need to transform each vertex before rasterization.

Strictly speaking, the transformations are linear only if we use homogeneous coordinates $(wx, wy, w) \in \mathbb{RP}^2$ in the projective space to represent $(x,y) \in \mathbb{R}^2 $ in Euclidean space. The advantage of this framework is that 2D translation, rotation and scaling are all represented as composable $3\times3$ matrices: $$T_{x,y} = \begin{pmatrix} 1 & 0 & x \\ 0 & 1 & y \\ 0 & 0 & 1 \end{pmatrix},\ R_{\theta} = \begin{pmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix},\ S_{c} = \begin{pmatrix} c & 0 & 0 \\ 0 & c & 0 \\ 0 & 0 & 1 \end{pmatrix}.$$ This is how CSS transform works under the hood.

Drag the square to see composed transformations in action!

Discretization

Now that we've handled the case of polygons, there is an obvious solution to finding curve.intervals(y): We can convert any curve into a polygon by discrete sampling! Increasing sampling rate would give us better results, and eventually a polygon would be indistinguishable from a smooth curve when line segments become subpixel.

Let's take elliptic arcs as a non-trivial example. Suppose we want to draw an arc from angle $\phi_1$ to $\phi_2$ on a ellipse with semi-major axis $a$ and semi-minor axis $b$, centered at $(x,y)$ and rotated by an angle $\theta$. We can sample $N + 1$ points: $$\Delta\phi = \frac{\phi_2 - \phi_1}{N}, \quad \begin{pmatrix}x_i\\y_i\end{pmatrix} = T_{x,y} R_{\theta} \begin{pmatrix}a\cos\left(\phi_1 + i\Delta\phi\right)\\ b\sin\left(\phi_1 + i\Delta\phi\right)\end{pmatrix}$$

You can smooth out this curve by increasing the sampling rate $N$!

$N$ $a$ $b$ $\theta$ $\phi_1$ $\phi_2$

Stroking

Ellipses are not just interesting in their own right — thick curves, which are conceptually 1D but in fact 2D for practical purposes, could be rendered by drawing a sequence of skinny rectangles joined by circles, which are, of course, a special case of ellipses!

Try writing with your mouse/trackpad on this canvas.

Can you spot the circles and rectangles?

Color Gradients

Before we transition to text rendering, let's take a brief detour and take a look at color gradients! Suppose we want the color to change from $C_1$ to $C_2$. All we need to do is to "linearly interpolate" — i.e., to calculate a weighted average of — the two colors at each pixel: $$interpolate(C_1, C_2, t) = (1-t) \cdot C_1 + t \cdot C_2,$$ where the parameter $t$ varies with some sort of distance metric across pixels.

Take a look at these examples: $$t_{radial} = \frac{\sqrt{(x-x_o)^2 + (y-y_o)^2}}{r}, \quad t_{linear} = \frac{y - y_{min}}{y_{max} - y_{min}}$$

Bézier Curves

Interestingly, applying linear interpolation on positions recursively based on the same $t$ parameter at every level produces the so-called "Bézier curves".

Given control points $p^{(0)}_0, p^{(0)}_1, \dots p^{(0)}_n$, we could sample a Bézier curve $B(t)$ at $t$ iteratively using the De Casteljau's algorithm:

Click on this canvas to place your control points for the Bézier curve.

Fonts

In case you haven't guessed it by now, every single letter you are reading on this page at this moment — or indeed, any text that uses OpenType/TrueType fonts — is made of nothing but linear (lines segments), quadratic and cubic Bézier curves!

The text on this canvas is rendered using the same JavaScript implementation of the scan-line rasterization algorithm as well as the discretization trick we discussed earlier.

View Tree

A GUI mainly consists of a containment hierarchy of geometries, which are best represented as a tree. Visually, containers are placed beneath the things they contain, so nodes closer to the root must be drawn first. In other words, we should perform a pre-order traversal:

Hit-detection is straightforward if we record the polygons in their draw order. We can simply do a linear search backwards for the last drawn — and therefore topmost — polygon containing the cursor position.

These buttons are handmade from raw pixels!

You can adjust the border radius to test the hit-detection mechanism.

Alpha Blending

Within a view tree, geometries overlap and obscure each other. If for aesthetic reasons we want to see through a layer, we need to blend it with the underlying layers. To do so, instead of overwriting each pixel when drawing, we need to calculate a weighted average of the existing color $(r_2,g_2,b_2)$ at each pixel and the new color $(r_1,g_1,b_1)$ to be painted over it. The weight for each of the two colors is determined by its contribution to the alpha channel: $$a = a_1 + a_2 (1-a_1)$$ $$(r,g,b) = \frac{a_1}{a}(r_1,g_1,b_1) + \frac{a_2(1-a_1)}{a}(r_2,g_2,b_2)$$

Are you able to identify the stacking order of these transparent layers?

You can increase the opacity to verify your answer!

Filtering

If we want to clearly distinguish transparent layers, we could apply a blurring post-processing effect. To do so, we need to "convolve" the image with a "kernel" — that sounds rather highbrow, but it just means a weighted average (again!) of the neighborhood surrounding each pixel. The weights are given by a function $G(u,v)$ called the "kernel": $$C = \sum_{u=-k}^{k} \sum_{v=-k}^{k}G(u,v)$$ $$Output(x,y) = \frac{1}{C}\sum_{u=-k}^{k} \sum_{v=-k}^{k} G(u,v) \cdot Input(x-u, y-v).$$

For blurring we usually use (a discrete version of) the 2D Gaussian function $$G(x,y) = \frac{1}{2\pi\sigma^2}\exp\left(-\frac{x^2+y^2}{2\sigma^2}\right).$$ This effect is called "Gaussian blur".

Drag the lens across the image to see Gaussian blur in action!

The blur effect is written in JavaScript and performed on the CPU.

Hardware Acceleration

So far, we've been iterating through a lot of pixels for every single frame! One glaring problem (among several others) with this naïve approach is that it doesn't scale well with higher screen resolutions. Fortunately, for many of the steps above, we can actually process pixels in parallel —

This text is being rendered as triangles on the GPU.

In each frame, a random triangulation is performed on the CPU.