How exactly does a computer compute a vector-based graphical user interface into existence? How do vivid visual metaphors arise algorithmically from the first principles of logic and arithmetic? This (incomplete) article is my first attempt at a coherent answer to these questions in a way as concise as possible.
The interactivity of GUIs comes from its event loop:
In each iteration, the program processes inputs and generates outputs:while !quit if event = poll(): handle(event) # input if dirty: draw() # output
fn handle(event):
target = pick(event.xy) # <-- input, state
target.handle(event) # --> state
fn draw():
for each geometry on screen: # <-- state
draw(geometry) # --> output, state
These two steps are are interconnected like yin and yang — picking an
event handling object selects a code path for program state updates and
affects the geometries output by
draw(). The updated arrangement of these geometries, in turn,
dictates the behavior of pick(event.xy) in the next iteration.
At the core of 2D vector graphics is the mathematical concept of "the interior of closed curves", e.g. circles and rectangles. We've all learned how to fill the interior of a rectangle with a nested loop:
fn draw(rect):
for y in rect.y1...rect.y2:
for x in rect.x1...rect.x2:
rect.color(x, y)
This idea can be generalized to all closed curves. The only difference is that the horizontal scan line at each \(y\) could potentially enter and leave the interior of the curve multiple times, resulting in multiple intervals to be filled:
fn draw(curve):
for y in curve.y1...curve.y2:
for x1, x2 in curve.intervals(y):
for x in x1...x2:
curve.color(x, y)
We can implement pick(event.xy) using the same logic: If a
point at \((x,y)\) lies in the interior of a curve, \(x\) must be in the
middle of one of the horizontal intervals at \(y\), so there must be an odd
number of intersections on either side in the \(x\) direction. This
hit-detection is known as the "even-odd rule". We could also apply
the "non-zero winding rule", which I will not get into here.
How do we find curve.intervals(y) for an arbitrary
parameterized closed curve? To answer this question, let's first look at a
simple case: polygons. For polygons, our problem becomes much more
tractable: We just need to find all line segments intersecting the
scan line.
Given a line segment between \((x_1,y_1)\) and \((x_2, y_2)\), where \(y_1 < y_2\), we know a scan line at \(y\) can only cross the line segment if \(y_1 \leq y < y_2\), in which case $$k=\frac{x_2-x_1}{y_2-y_1}, \quad intersection=\left(x_1 + k(y-y_1), y\right).$$
Here is the complete scan-line rasterization algorithm:
Keep track of the set of edges intersecting the scan line, along with their respective intersection points. In each iteration:
- Traverse each interval at current \(y\) and color each pixel.
- Increment \(y\), then:
- Discard edges that no longer intersect the scan line.
- Increment each intersection point \(p\) by \((k_p, 1)\).
- Add new edges that begin to intersect the scan line.
This canvas implements the rasterization algorithm described above.
Click on the canvas to place your vertices. You'll need at least 3.
To move and morph our polygons, we need to apply linear transformations. Since a polygon is defined by its vertices, we just need to transform each vertex before rasterization.
Strictly speaking, the transformations are linear only if we use
homogeneous coordinates \((wx, wy, w) \in \mathbb{RP}^2\) in the
projective space to represent \((x,y) \in \mathbb{R}^2 \) in
Euclidean space. The advantage of this framework is that 2D translation,
rotation and scaling are all represented as composable \(3\times3\)
matrices: $$T_{x,y} = \begin{pmatrix} 1 & 0 & x \\ 0 & 1 & y \\ 0 & 0 & 1
\end{pmatrix},\ R_{\theta} = \begin{pmatrix} \cos\theta & -\sin\theta & 0 \\
\sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix},\ S_{c} =
\begin{pmatrix} c & 0 & 0 \\ 0 & c & 0 \\ 0 & 0 & 1 \end{pmatrix}.$$ This is
how CSS transform works under the hood.
Drag the square to see composed transformations in action!
Now that we've handled the case of polygons, there is an obvious solution to
finding curve.intervals(y): We can convert any curve into a
polygon by discrete sampling! Increasing sampling rate would give us better
results, and eventually a polygon would be indistinguishable from a smooth
curve when line segments become subpixel.
Let's take elliptic arcs as a non-trivial example. Suppose we want to draw an arc from angle \(\phi_1\) to \(\phi_2\) on a ellipse with semi-major axis \(a\) and semi-minor axis \(b\), centered at \((x,y)\) and rotated by an angle \(\theta\). We can sample \(N + 1\) points: $$\Delta\phi = \frac{\phi_2 - \phi_1}{N}, \quad \begin{pmatrix}x_i\\y_i\end{pmatrix} = T_{x,y} R_{\theta} \begin{pmatrix}a\cos\left(\phi_1 + i\Delta\phi\right)\\ b\sin\left(\phi_1 + i\Delta\phi\right)\end{pmatrix}$$
You can smooth out this curve by increasing the sampling rate \(N\)!
Ellipses are not just interesting in their own right — thick curves, which are conceptually 1D but in fact 2D for practical purposes, could be rendered by drawing a sequence of skinny rectangles joined by circles, which are, of course, a special case of ellipses!
Try writing with your mouse/trackpad on this canvas.
Can you spot the circles and rectangles?
Before we transition to text rendering, let's take a brief detour and take a look at color gradients! Suppose we want the color to change from \(C_1\) to \(C_2\). All we need to do is to "linearly interpolate" — i.e., to calculate a weighted average of — the two colors at each pixel: $$interpolate(C_1, C_2, t) = (1-t) \cdot C_1 + t \cdot C_2,$$ where the parameter \(t\) varies with some sort of distance metric across pixels.
Take a look at these examples: $$t_{radial} = \frac{\sqrt{(x-x_o)^2 + (y-y_o)^2}}{r}, \quad t_{linear} = \frac{y - y_{min}}{y_{max} - y_{min}}$$
Interestingly, applying linear interpolation on positions recursively based on the same \(t\) parameter at every level produces the so-called "Bézier curves".
Given control points \(p^{(0)}_0, p^{(0)}_1, \dots p^{(0)}_n\), we could sample a Bézier curve \(B(t)\) at \(t\) iteratively using the De Casteljau's algorithm:
In each iteration \(k \in [1, n]\), for each \(i \in [0, n - k]\): $$p^{(k)}_i = (1-t) \cdot p^{(k-1)}_i + t \cdot p^{(k-1)}_{i+1}$$ Finally, \(B(t) := p^{(n)}_0\)
Click on this canvas to place your control points for the Bézier curve.
In case you haven't guessed it by now, every single letter you are reading on this page at this moment — or indeed, any text that uses OpenType/TrueType fonts — is made of nothing but linear (lines segments), quadratic and cubic Bézier curves!
The text on this canvas is rendered using the same JavaScript implementation of the scan-line rasterization algorithm as well as the discretization trick we discussed earlier.
Let's put everything together.
A GUI mainly consists of a containment hierarchy of geometries, which are best represented as a tree. Visually, containers are placed beneath the things they contain, so nodes closer to the root must be drawn first. In other words, we should perform a pre-order traversal:
fn drawView(parent):
for curve in parent.curves:
draw(curve)
for child in parent.children:
drawView(child)
Hit-detection is straightforward if we record the polygons in their draw order. We can simply do a linear search backwards for the last drawn — and therefore topmost — polygon containing the cursor position.
These buttons are handmade from raw pixels!
You can adjust the border radius to test the hit-detection mechanism.
Within a view tree, geometries overlap and obscure each other. If for aesthetic reasons we want to see through a layer, we need to blend it with the underlying layers. To do so, instead of overwriting each pixel when drawing, we need to calculate a weighted average of the existing color \((r_2,g_2,b_2)\) at each pixel and the new color \((r_1,g_1,b_1)\) to be painted over it. The weight for each of the two colors is determined by its contribution to the alpha channel: $$a = a_1 + a_2 (1-a_1)$$ $$(r,g,b) = \frac{a_1}{a}(r_1,g_1,b_1) + \frac{a_2(1-a_1)}{a}(r_2,g_2,b_2)$$
This is known as the "over operator".
Are you able to identify the stacking order of these transparent layers?
You can increase the opacity to verify your answer!
If we want to clearly distinguish transparent layers, we could apply a blurring post-processing effect. To do so, we need to "convolve" the image with a "kernel" — that sounds rather highbrow, but it just means a weighted average (again!) of the neighborhood surrounding each pixel. The weights are given by a function \(G(u,v)\) called the "kernel": $$C = \sum_{u=-k}^{k} \sum_{v=-k}^{k}G(u,v)$$ $$Output(x,y) = \frac{1}{C}\sum_{u=-k}^{k} \sum_{v=-k}^{k} G(u,v) \cdot Input(x-u, y-v).$$
For blurring we usually use (a discrete version of) the 2D Gaussian function $$G(x,y) = \frac{1}{2\pi\sigma^2}\exp\left(-\frac{x^2+y^2}{2\sigma^2}\right).$$ This effect is called "Gaussian blur".
Drag the lens across the image to see Gaussian blur in action!
The blur effect is written in JavaScript and performed on the CPU.
So far, we've been iterating through a lot of pixels for every single frame! One glaring problem (among several others) with this naïve approach is that it doesn't scale well with higher screen resolutions. Fortunately, for many of the steps above, we can actually process pixels in parallel —
This text is being rendered as triangles on the GPU.
In each frame, a random triangulation is performed on the CPU.