Handmade Maths: Cosine and Sine

21 August, 2021

Have you ever wondered how the magical cosine and sine buttons on your calculator work? If so, then this article will not only answer this question, but also develop most of trigonometry and rotation transformations along the way. The only prerequisites are a basic understanding of vectors, and general comfort with algebra. Let's roll up our sleeves and begin exploring the fascinating world of rotations!

1. Foundation

A natural question that comes up after you learn about the basic operations in vector algebra - adding, subtracting, and scaling - is how to create rotated versions of a vector. Given how frequently rotations come up when solving problems, not having them as a vector transformation would make the entire framework seem incomplete.

Before we begin exploring this topic, we'll need to come up with a preliminary method for assigning numerical values to distinct rotation transformations. Suppose you decided to rotate a stick around a fixed (pivot) point on it. If we pick a distinguished point on the stick, different from the pivot, and keep track of its location as we rotate, we'll eventually trace out a circular shape and the point will arrive back to its initial position. This seems like an important and easy to identify characteristic of the transformation, so we can assign the number 1 to a rotation that takes any point on the circle back to its starting location exactly once - the centre point is special in that it never moves. Note that the "1" here is a distinct rotation unit and shouldn't be confused with the units one would use to measure distances - I'll refer to it as a turn (abbreviated by $t$ ) from now on. The distance between the pivot and the distinguished point will be our unit of distance, abbreviated as $d$ . Later on we'll need to figure out a way to convert between turn and distance units, but our most pressing task is to address the problem of interpolation, or how to perform rotations that correspond to different turn values.

An intuitive way to think about turn values between 0 and 1 is to associate them with the boundary traced out by a non-pivot point in a full turn. For example, half a turn would be the point that splits the boundary into two equal pieces, meaning that we can rotate one of the pieces around the pivot to perfectly overlap the other. Constructing this point is quite easy: Draw a line, pick a pivot point on it, and draw a circle centred on that point with the help of a compass. The two places where the circle intersects the line are the 0 (same as 1) and 1/2 turn points.

A quarter (1/4) turn is a bit trickier because we need to figure out how to split the 1/2-turn segment (bounded by 0 and 1/2) in half. Fortunately, there's an ancient compass construction that can help us accomplish this task! Place the needle on either the 0 or 1/2 point and draw a circle with a radius that's larger than the distance between the point and the centre. Repeat the same process with the other boundary point and observe that the new circles intersect in two additional points, and the line going through these points will split the half-turn arcs of the original circle in half:

The 0, 1/4, 1/2, and 3/4 turn points together with the circle centre can be used to set up a two-dimensional Cartesian coordinate system, which would allow us to establish the following connection between turn values and vectors:

Turn	Vector
$0, 1$	$(1, 0)$
$1/4$	$(0, 1)$
$1/2$	$(-1, 0)$
$3/4$	$(0, -1)$

Once again, the Cartesian coordinate values for these vectors are distinct from the turn values despite using the same numeral system - this is an important distinction that is often neglected in school due to the use of implicit units. While this would be a great starting point for developing the ideas behind vector algebra, in the interest of time, I'll assume the reader is familiar with the basics and address the problem of rotation in the context of vectors.

How would you rotate a vector by a quarter turn counterclockwise? It's usually a good idea to begin exploring such questions with the simplest possible configuration you can imagine - the basis vectors. Starting with $(1, 0)$ , rotating a quarter turn counterclockwise results in the vector $(0, 1)$ by definition. Our first observation is that the position of the numbers 0 and 1 was swapped, so we can conjecture that rotating an arbitrary 2D vector will involve swapping its x and y coordinates. Swapping the x and y coordinates of a vector is actually a reflection across the vector $(1, 1)$ (the line $y = x$ ). A single test case is never sufficient so let's continue rotating!

Now that we're at $(0, 1)$ , applying the same rotation would give us $(-1, 0)$ . While the coordinates had to be swapped as we expected, the x value also had to be negated. Fortunately, it's quite easy to reconcile this new finding with the initial conjecture because negating zero doesn't do anything! We can now update our rotation algorithm: Rotating a quarter turn counterclockwise requires a coordinate swap followed by a negation of the new x value. Take note of the fact that negating a vector's x coordinate is the same as reflecting it across the $(0, 1)$ axis.

Does the updated conjecture hold after the third rotation? Indeed, it does! From $(-1, 0)$ we have to go to $(0, -1)$ - the values are swapped and the zero is implicitly negated. The final rotation takes us from $(0, -1)$ to $(1, 0)$ and the conjecture still holds. Marvellous! We've now gone a full circle and our refined initial observation held up beautifully.

Let's check whether this formula works for an arbitrary vector (drag the blue point to move it around):

Looks good to me! The reason why this works is because every vector can be expressed as a linear combination of the basis vectors, so if you rotate the basis, everything else follows. Another way to visualise this transformation is by drawing 2 vectors on a sheet of paper, then rotating the sheet itself. Algebraically, recall that a vector $(a, b)$ can be decomposed into $a(1, 0) + b(0, 1)$ , and, if we rotate the individual vectors by a quarter turn using our formula, we get $a(0, 1) + b(-1, 0) = a(0, 1) - b(1, 0)$ , which is exactly the same as swapping the original vector's entries, and negating the first one.

As it turns out, swapping the coordinates and negating one of them is one of the most important operations in mathematics! Unfortunately, students are almost never exposed to the true power behind it, and to make matters even more confusing, people often refer to this operation as a number! We'll return to this idea later.

If you were reading carefully, you might be wondering why I pointed out the implicit reflections in the rotation formula. While exploring this observation isn't necessary to understand the rest of the article, it provides important insight into more advanced mathematical treatments of rotation. We were somehow able to rotate a vector by a quarter turn counterclockwise with the help of two successive reflections across vectors separated by an eight of a turn (the line $y=x$ splits the 0-1/4 arc into equal pieces). The curious amongst you may wish to investigate this lead on your own because it leads to simple and elegant generalisations of rotation in any dimension!

2. Measurement and Normalization

We only know the coordinates of 4 points on the circle so far. How do we get the rest? Intuitively, the method we used to construct the circle implies that all vectors with a distance of 1 from the origin should lie somewhere on it. It's worth investigating how we can scale any vector so it has length 1, but before we get to that point, how do we even calculate the length of a vector that's not aligned with our coordinate lines? We'll approach this problem in a somewhat roundabout way.

Historically, any notion of distance measurement required a representative object, a unit, to measure against - be that the length of a body part, a stick, or even a metal rod stored in a safe place. It was only recently that we started relying on fundamental physical phenomena, like the speed of light, to define our units. While metrology, the science of measurement, is a fascinating subject, the world of mathematics is much simpler. We already defined our unit of distance, $d$ , but we'll also need an appropriate 2-dimensional object to represent area, which is the space encompassed by triangles, rectangles, and various other shapes. One of the reason why squares are typically chosen as the representative object for area is due to counting. An easy way to count how many, say, coins you have is to arrange them in a rectangular pattern, find how many coins there are along each side, then multiply the results to get the total count (Can you arrange any number of coins in a rectangular pattern?). A square is more appropriate than a rectangle because we need to specify the length of only one side to describe it, so its shape is less arbitrary.

We can define the area of a square with side length $x$ to be $x \cdot x = x^2$ , which is associated with a unit of $d^2$ . This is sufficient to also derive the formula for a rectangle. Consider the area of a square with side length $x + h$ . Geometrically, it looks like this:

We have two squares with area $x^2$ and $h^2$ , and two unknown rectangles with equal area. We can subtract the small squares from the big one and divide by $2$ to compute the area of a rectangle algebraically:

$\frac{(x + h)^2 - x^2 - h^2}{2} = \frac{x^2 + 2xh + h^2 - x^2 - h^2}{2} = \frac{2xh}{2} = xh$

Splitting squares or rectangles along a diagonal gives us two triangles with equal sides which are obviously rotated version of each other, so we can define their area to be $xh/2$ . If you want to prove this more rigorously, you

2.1 Length

If we find out how to calculate the area of a square whose sides are formed by an arbitrary vector, we can simply take the square root of that formula to get the length of the vector. Conveniently, our quarter-turn rotation allows us to easily construct such squares:

This diagram shows a square constructed by rotating an arbitrary vector $(a, b)$ . Take a moment to convince yourself that, although the signs may differ, and $a$ and $b$ can be swapped depending on the starting quadrant, the lengths will remain the same regardless of the coordinate choice. We know how calculate the area of the big rectangle and the 4 shaded triangles, so we can simply subtract the total area of the triangles from the area of the square to get the area of the inner square. From here on, assume that all mentions of $a$ and $b$ as lengths are referring to the absolute value of those coefficients (removing the sign). If you calculate the side lengths of the big rectangle, you'll find that it is, in fact, a square with side length $(a + b)$ , so its area is simply $(a + b)^2$ . The 4 small triangles also have equal area: $ab/2$ . Putting all of this together we get our length formula:

$\sqrt{(a + b)^2 - 4ab/2} = \sqrt{a^2 + 2ab + b^2 - 2ab} = \sqrt{a^2 + b^2}$

You might recognise the Pythagorean theorem in this derivation, and the formula can easily be extended to compute vector lengths in any dimension. It can also be used to find formulas for constructing rectangles formed by 2 arbitrary vectors, which gives us a notion of projection. Remarkably, this is all we need to calculate parallelepiped (also called parallelotope in higher dimensions) areas, volumes, and higher-dimensional analogues of those in any dimension. If you're curious, you can look into the Gram–Schmidt process, the Gram determinant, and the alternative way of computing those via Exterior Algebra.

Back to our problem at hand, scaling vectors to have length of 1 (unit length; such vectors are also called unit vectors) is as simple as ensuring the value under the square root is 1, which we can accomplish through division:

$\sqrt{\frac{a^2 + b^2}{a^2 + b^2}} = \sqrt{\frac{a^2}{a^2 + b^2} + \frac{b^2}{a^2 + b^2}} = \sqrt{\left(\frac{a}{\sqrt{a^2 + b^2}}\right)^2 + \left(\frac{b}{\sqrt{a^2 + b^2}}\right)^2}$

It looks like all we need to do to scale a vector so it has unit length is divide its coefficients by the length. This operation is usually called "normalization".

3. Arbitrary Rotations

Our simple rotation formula has a glaring deficiency: What if we wanted to rotate by a different turn value? This problem appears to be rather tricky so let's first reduce it to something that's easier to work with - a right (quarter turn) triangle. We construct this triangle by raising a perpendicular from the horizontal radius vector $(1, 0)$ until the hypothenuse intersects the circle in the turn point we're interested in:

While we don't really know what turn value corresponds to this intersection point - even though we can compute its Cartesian coordinates - we can use the right triangle as an intermediate construction until we develop more sophisticated tools.

For the sake of simplicity, let's assume that the radius of the circle is 1 distance unit, which makes the base of our triangle equal to 1. This simplification also has the convenient side effect of making the triangle's height act as a multiplier - if you multiply the vertical unit vector by this length you'll get the height vector. We're about to make good use of this property!

The height can be arbitrary so let's just call it $h$ , and the length of the hypothenuse will be denoted $l$ . With this setup in place, suppose I asked you to find an algorithm that rotates the base $\vec{e}_1 = (1, 0)$ so that it overlaps the hypothenuse. How would you do it?

If you're comfortable with the idea behind vector scaling, you might suggest multiplying the hypothenuse vector by the reciprocal of its length $l = \sqrt{1 + h^2}$ :

$\vec{e}_r = \frac{1}{l} \cdot (1, h) = \bigg(\frac{1}{l}, \frac{h}{l}\bigg)$

While this is, indeed, the correct answer, such an algorithm only works for starting vectors that lie flat on the horizontal axis. A more general algorithm would first construct the hypothenuse vector before performing the rescaling. Using only the vector we want to rotate as a starting point and the value $h$ , how do we construct a hypothenuse?

Because we ultimately want a quarter turn separation between the base and the height, we can use the handy quarter-turn-counterclockwise rotation formula (represented by the function $\mathrm{rotQ}$ ) to rotate the base vector. Next, we need to rescale it to a unit vector, and then multiply it by $h$ to make its length equal to the height vector. Our starting vector $\vec{e}_1$ had a length of 1 and that's convenient because we can skip the normalization step. For now, let's assume our more general vector $\vec{v}$ also has a unit length. Rotating and rescaling $\vec{v}$ by $h$ can be expressed as follows:

$\vec{p} = h \cdot \mathrm{rotQ}(\vec{v})$

We then move our new construction from the origin to the correct place by adding the starting vector to it:

$\vec{h} = \vec{p} + \vec{v} = h \cdot \mathrm{\mathrm{rotQ}}(\vec{v}) + \vec{v}$

With the hypothenuse constructed, we can now rescale to get the final rotated vector $\vec{v}_r$ :

$\vec{v}_r = ( h \cdot \mathrm{rotQ}(\vec{v}) + \vec{v} ) \cdot \frac{1}{l} = \frac{h}{l} \cdot \mathrm{rotQ}(\vec{v}) + \frac{1}{l} \cdot \vec{v}$

Let's introduce two helper variables to reduce the visual noise in these equations even further:

$c = \frac{1}{l} \quad s = \frac{h}{l}$

$\vec{v}_r = s \cdot \mathrm{rotQ}(\vec{v}) + c \cdot \vec{v}$

Amazingly simple relation, isn't it? Notice that $c$ and $s$ are the normalized coordinates of the hypothenuse, so we can leverage this finding to overcome $h$ 's limitation of only being able to represent rotations strictly less than a quarter turn (Can you see why that is?). Mapping a turn value to a vector instead of a scalar gives us a lot of flexibility! Setting $\vec{v}$ equal to an arbitrary unit vector $(a, b)$ and unpacking the formula above we get:

$\vec{v}_r = s \cdot (-b, a) + c \cdot (a, b) = (c a - s b, s a + c b)$

If you've taken a course on linear algebra you might recognise this formula as a 2D rotation matrix! It turns out that $(a, b)$ doesn't even need to be a unit vector as long as $c$ and $s$ satisfy the following constraint: $c^2 + s^2 = 1$ . To verify this, all we need to do is bring back the initial normalization step for $\vec{v}$ that we omitted:

$\vec{v}_r = \bigg(\frac{s}{\| \vec{v} \|} \cdot \mathrm{rotQ}(\vec{v}) + \frac{c}{\| \vec{v} \|} \cdot \vec{v}\bigg) \cdot \| \vec{v} \| = s \cdot \mathrm{rotQ}(\vec{v}) + c \cdot \vec{v}$

The double vertical bars $\|$ represent the length of the vector. We're normalizing the offset before adding it to the rotated vector because we want $h$ to apply to all vectors consistently, regardless of their length. We also need to multiply by the length at the end to undo the normalization. As you can see, the vector length value simply cancels out.

Another way to examine the behaviour of our arbitrary rotation formula is to compute the length of the output vector:

$\begin{aligned} \|\vec{v}_r\|^2 &= (c a - s b)^{2} + (s a + c b)^{2} = a^{2} c^{2} - \cancel{{2} a b c s} + b^{2} s^{2} + a^{2} s^{2} + \cancel{{2} a b c s} + b^{2} c^{2} \\ &= a^{2} (c^{2} + s^{2}) + b^{2} (c^{2} + s^{2}) = (a^{2} + b^{2})(c^{2} + s^{2}) \\ \|\vec{v}_r\| \enspace &= \sqrt{a^{2} + b^{2}} \sqrt{c^{2} + s^{2}} \end{aligned}$

It looks like the length of the rotated vector is the product of the lengths of the two input vectors! This is why $(c, s)$ being a unit vector preserves the length of the input when performing the rotation.

Now that we have an algorithm to perform arbitrary vector rotations, we can have some fun with it! Let's generate a bunch of successive rotations:

3.1 Composing Rotations

Now is a good time to reflect on what we've discovered so far. First, the inputs to our arbitrary rotation formula are actually two vectors. Moreover, the order in which they go in doesn't really matter (feel free to check this for yourself) even though we started off under the assumption that one of the vectors will represent a turn value - and will therefore be distinguished from the other. One conclusion we can draw from this is that both vectors can play the role of a turn value representative and we've stumbled upon a rather beautiful way of composing rotations! For example, if we rotated $(1, 0)$ by a vector $\vec{v}_1 = (a, b)$ , and then rotated the result again by a vector $\vec{v}_2 = (c, d)$ , we'd get $\vec{v}_3 = (ac - bd, ad + bc)$ , which is where $(1, 0)$ would end up if we directly rotated it by the sum of the turn values encoded in $\vec{v}_1$ and $\vec{v}_2$ .

This rotation composition property is incredibly useful in practical applications and is not limited to rotations in 2 dimensions. In fact, mathematicians have even discovered an algorithm for constructing such "rotation algebras", though exploring it is left as an exercise for the curious reader.

As you have seen so far, we managed to use our humble quarter turn rotation - and a little bit of basic vector intuition - to develop a more general formula that can aid us in solving problems. However, we're not done yet: The question of converting between turn and distance units is still unanswered! Fortunately, we now have all the tools we need to tackle it, so let's do that next.

4. Travelling Around the Circle

Here's a question: Starting at $(1, 0)$ , if we travelled a distance of $d$ units counterclockwise around the circle, what are the coordinates of the location we'll arrive at?

Perhaps we can repurpose our arbitrary rotation formula to answer this question! The key observation is that we can use multiples of $h$ to approximate a distance of $d$ units along the circle because the smaller the $h$ , the closer it hugs the arc of the circle:

Zoom in on $(1, 0)$ automatically to keep the height constant

If we divide $d$ by $h$ , we'll get the rough number of successive rotations we need to perform to arrive at the final location. But what if $d$ is not a whole number? In that case, we can either round the rotation count to the nearest integer or make $h$ itself a multiple of the lowest power of 10 in $d$ . For example, if $d$ is equal to 1.2345, we can make $h$ a multiple of 0.0001 so that $d/h$ is guaranteed to be a whole number. Either way, non-integer distance values won't really pose a problem. Let's see what we get for $d$ equal to 2 and $h$ equal to 0.001, which would require 2000 rotations:

import math

def circularTravel(d, h):
    c = 1/math.sqrt(1 + h*h)
    s = h*c
    x = 1
    y = 0

    rotations = round(d/h)
    for _ in range(rotations):
        xn = c*x - s*y
        yn = s*x + c*y

        x = xn
        y = yn

    print('-'*32)
    print(f'd={d}, h={h}, rotations={rotations}')
    print(f'x={x}\ny={y}')

circularTravel(2, 0.1)
circularTravel(2, 0.01)
circularTravel(2, 0.001)

--------------------------------
d=2, h=0.1, rotations=20
x=-0.41011187409312255
y=0.9120352244994887
--------------------------------
d=2, h=0.01, rotations=200
x=-0.4160862194310148
y=0.9093251662632378
--------------------------------
d=2, h=0.001, rotations=2000
x=-0.4161462303490977
y=0.9092977042564712

I added two more configurations so we have something to compare against. The brute-force algorithm works but is extremely inefficient, especially when $d$ becomes really large. Can we somehow optimize it to make it faster? We can!

5. Finding Patterns

Recurrence relations are great at providing succinct formulas, but they tend to hide a lot of the patterns behind the underlying computation. Our first step is to create a trace of the repeated rotation by computing the flat equations at each step of evaluation. Once we have that, we can look for patterns in it, and, ideally, come up with an algorithm that generates the $n$ -th rotation directly.

Pattern spotting is a time-honoured practice in both mathematics and programming because it can give you insight into the structure of a system, and understanding the structure is a prerequisite for modifying it without breaking it.

There's one problem, however, which is that creating a trace involves a lot of algebraic manipulations which, while simple, are tedious and error prone. Mathematicians back in the day had to patiently do these by hand (or offload that work to their assistants), but, as inhabitants of the 21st century, we have a much better option - make the computer do the algebra for us! To do this, we'll need a computer algebra system package and I chose Python's SymPy because it's simple and free. While it's not strictly necessary, you can follow along by installing the Python environment and executing pip install sympy in your operating system terminal.

First, let's make SymPy evaluate a trace of the recurrence relation as it is:

from sympy import *

rotations = 12
c, s = var('c, s')
x = 1
y = 0

xs = [x]
ys = [y]
for _ in range(rotations):
    xn = c*x - s*y
    yn = s*x + c*y

    xs.append(xn.expand())
    ys.append(yn.expand())

    x = xn
    y = yn

print('x coordinates:')
for x in xs:
    print(x)

print('\ny coordinates:')
for y in ys:
    print(y)

Table 1

n	x	y
0	$1$	$0$
1	$c$	$s$
2	$c^{2} - s^{2}$	$2 c s$
3	$c^{3} - 3 c s^{2}$	$3 c^{2} s - s^{3}$
4	$c^{4} - 6 c^{2} s^{2} + s^{4}$	$4 c^{3} s - 4 c s^{3}$
5	$c^{5} - 10 c^{3} s^{2} + 5 c s^{4}$	$5 c^{4} s - 10 c^{2} s^{3} + s^{5}$
6	$c^{6} - 15 c^{4} s^{2} + 15 c^{2} s^{4} - s^{6}$	$6 c^{5} s - 20 c^{3} s^{3} + 6 c s^{5}$
7	$c^{7} - 21 c^{5} s^{2} + 35 c^{3} s^{4} - 7 c s^{6}$	$7 c^{6} s - 35 c^{4} s^{3} + 21 c^{2} s^{5} - s^{7}$
8	$c^{8} - 28 c^{6} s^{2} + 70 c^{4} s^{4} - 28 c^{2} s^{6} + s^{8}$	$8 c^{7} s - 56 c^{5} s^{3} + 56 c^{3} s^{5} - 8 c s^{7}$
9	$c^{9} - 36 c^{7} s^{2} + 126 c^{5} s^{4} - 84 c^{3} s^{6} + 9 c s^{8}$	$9 c^{8} s - 84 c^{6} s^{3} + 126 c^{4} s^{5} - 36 c^{2} s^{7} + s^{9}$
10	$c^{10} - 45 c^{8} s^{2} + 210 c^{6} s^{4} - 210 c^{4} s^{6} + 45 c^{2} s^{8} - s^{10}$	$10 c^{9} s - 120 c^{7} s^{3} + 252 c^{5} s^{5} - 120 c^{3} s^{7} + 10 c s^{9}$
12	$c^{11} - 55 c^{9} s^{2} + 330 c^{7} s^{4} - 462 c^{5} s^{6} + 165 c^{3} s^{8} - 11 c s^{10}$	$11 c^{10} s - 165 c^{8} s^{3} + 462 c^{6} s^{5} - 330 c^{4} s^{7} + 55 c^{2} s^{9} - s^{11}$

Looking at these results may be overwhelming if this is your first time doing something like this, so let's take it slow and see if we can make sense of the information.

Before we begin analysing, it's important to have a good idea about what each row represents. For example, row #7 for the x coordinates is a multivariate (multiple variables) polynomial which, when evaluated for any given $c$ and $s$ , will give us the x coordinate of $(1, 0)$ after we rotate it 7 times.

The first step we'll take is check for any patterns in the powers. One observation that immediately stands out to me is the fact that the powers of $c$ and $s$ in each term add up to the same value - the rotation number. Recall that $c$ and $s$ are the coordinates of a normalized direction vector, which means there's a division by a square root involved in their computation. However, on every even rotation this square root cancels out and makes the formula a lot more pleasant. Let's focus on the second iteration, which is effectively a rotation by twice the angle.

Suppose $c$ and $s$ come from normalizing a direction vector $\vec{d} = (x, y)$ . Reading off the x and y values for the second iteration we get $\vec{e}_2 = (c^{2} - s^{2}, 2 c s)$ , which is the vector we get after we rotate $(1, 0)$ by twice the angle $\vec{d}$ encodes. Replacing $c$ and $s$ with their corresponding values from the normalized $\vec{d}$ we get the following:

$c = \frac{x}{\sqrt{x^2 + y^2}}, \quad s = \frac{y}{\sqrt{x^2 + y^2}}$

$\vec{e}_2 = \bigg(\frac{x^2 - y^2}{x^2 + y^2}, \frac{2 x y}{x^2 + y^2}\bigg)$

Remarkably, plugging in any rational numbers for $x$ and $y$ (with the exception of two zeros) is going to give us a pair of rational numbers $(a, b)$ that satisfy the relation $a^2 + b^2 = 1$ . This formula can also be used to create a rational parametrization of the unit circle by fixing one of the inputs (pick any value for it like 1) and varying the other, though the resulting points will not be evenly spaced along the circle's arc. There's a lot more to say about this formula but it would distract us from our task at hand - the curious reader may wish to explore topics like rational circular interpolation between two vectors.

Another pattern you might have picked up on is that all $s$ powers in the x coordinate are even, while $c$ powers alternate between even and odd in both x and y, depending on the row. A similar observation can be made about the y coordinates, except the power of $s$ is always odd. The reason why I find this interesting is because we already have a relationship between $c$ and $s$ : $c^2 + s^2 = 1$ , which can be rearranged into: $s^2 = 1 - c^2$

If we were to replace every occurrence of $s^2$ with $1 - c^2$ in the x coordinate we can eliminate $s$ entirely:

for x in xs[1:]: # Skip the first entry because it's an integer
    print(x.subs(s**2, 1 - c**2).expand())

$\begin{aligned} &c \\ &2 c^{2} - 1 \\ &4 c^{3} - 3 c \\ &8 c^{4} - 8 c^{2} + 1 \\ &16 c^{5} - 20 c^{3} + 5 c \\ &32 c^{6} - 48 c^{4} + 18 c^{2} - 1 \\ &64 c^{7} - 112 c^{5} + 56 c^{3} - 7 c \\ &128 c^{8} - 256 c^{6} + 160 c^{4} - 32 c^{2} + 1 \\ &256 c^{9} - 576 c^{7} + 432 c^{5} - 120 c^{3} + 9 c \\ &512 c^{10} - 1280 c^{8} + 1120 c^{6} - 400 c^{4} + 50 c^{2} - 1 \\ &1024 c^{11} - 2816 c^{9} + 2816 c^{7} - 1232 c^{5} + 220 c^{3} - 11 c \\ &2048 c^{12} - 6144 c^{10} + 6912 c^{8} - 3584 c^{6} + 840 c^{4} - 72 c^{2} + 1 \end{aligned}$

These polynomials are actually quite famous - they're called the Chebyshev polynomials of the first kind and you encounter them a lot in a branch of mathematics called approximation theory. The "first kind" in the name suggests the existence of a second kind, so let's see how to derive it. For the y coordinates we can factor out a single $s$ to get all even powers and perform the same substitution:

for y in ys[1:]: # Skip the first entry because it's an integer
    # Divide by 's' first to make the powers even and then
    # multiply by it after we perform the substitution
    yn = (y / s).expand().subs(s**2, 1 - c**2).expand() * s
    print(yn)

$\begin{aligned} &s \\ &s (2 c) \\ &s (4 c^{2} - 1) \\ &s (8 c^{3} - 4 c) \\ &s (16 c^{4} - 12 c^{2} + 1) \\ &s (32 c^{5} - 32 c^{3} + 6 c) \\ &s (64 c^{6} - 80 c^{4} + 24 c^{2} - 1) \\ &s (128 c^{7} - 192 c^{5} + 80 c^{3} - 8 c) \\ &s (256 c^{8} - 448 c^{6} + 240 c^{4} - 40 c^{2} + 1) \\ &s (512 c^{9} - 1024 c^{7} + 672 c^{5} - 160 c^{3} + 10 c) \\ &s (1024 c^{10} - 2304 c^{8} + 1792 c^{6} - 560 c^{4} + 60 c^{2} - 1) \\ &s (2048 c^{11} - 5120 c^{9} + 4608 c^{7} - 1792 c^{5} + 280 c^{3} - 12 c) \\ \end{aligned}$

The polynomials in the brackets are known as Chebyshev polynomials of the second kind (encountered in algebra and number theory), and, altogether, these formulas are called the multiple-angle formulas in trigonometry. I was required to memorize them in school (at least for the low powers), but, as you can see, they're actually pretty easy to derive with the help of a computer.

We examined the patterns in the powers and all that's left are the coefficients, but before we tackle those, there are a few simplifications I'd like to make in the context of our optimization problem. The first observation is that $l = \sqrt{1 + h^2}$ , which is the denominator for both $c$ and $s$ , is very close to 1 for small values of $h$ , so we might as well set it to 1. This simplification effectively removes the normalization step and will introduce increasing error with each rotation - remember, the length of the output vector is a multiple of the lengths of the input vectors. However, the direction of the output vector doesn't change so we can always normalize the final result if necessary (assuming we had arbitrary precision to work with).

Also, since we're only interested in small values of $h$ , we might as well make that explicit by replacing all its occurrences with $1 / h$ and working with large values of $h$ instead. Over the years, I've found that this tiny adjustment can greatly aid one's intuition.

These modifications result in $c = 1$ and $s = 1 / h$ . Plugging the new values into our multiple-angle formulas we get:

rotations = 12
s = 1 / var('h')

x = 1
y = 0
xs = [x]
ys = [y]

for _ in range(rotations):
    xn = x - s*y
    yn = s*x + y

    xn = xn.expand()
    yn = yn.expand()

    xs.append(xn)
    ys.append(yn)
    x = xn
    y = yn

print('x coordinates:')
for x in xs:
    print(x)

print('\ny coordinates:')
for y in ys:
    print(y)

n	x	y
0	$1$	$h^{-1}$
1	$1 - h^{-2}$	$2 h^{-1}$
2	$1 - 3 h^{-2}$	$3 h^{-1} - h^{-3}$
3	$1 - 6 h^{-2} + h^{-4}$	$4 h^{-1} - 4 h^{-3}$
4	$1 - 10 h^{-2} + 5 h^{-4}$	$5 h^{-1} - 10 h^{-3} + h^{-5}$
5	$1 - 15 h^{-2} + 15 h^{-4} - h^{-6}$	$6 h^{-1} - 20 h^{-3} + 6 h^{-5}$
6	$1 - 21 h^{-2} + 35 h^{-4} - 7 h^{-6}$	$7 h^{-1} - 35 h^{-3} + 21 h^{-5} - h^{-7}$
7	$1 - 28 h^{-2} + 70 h^{-4} - 28 h^{-6} + h^{-8}$	$8 h^{-1} - 56 h^{-3} + 56 h^{-5} - 8 h^{-7}$
8	$1 - 36 h^{-2} + 126 h^{-4} - 84 h^{-6} + 9 h^{-8}$	$9 h^{-1} - 84 h^{-3} + 126 h^{-5} - 36 h^{-7} + h^{-9}$
9	$1 - 45 h^{-2} + 210 h^{-4} - 210 h^{-6} + 45 h^{-8} - h^{-10}$	$10 h^{-1} - 120 h^{-3} + 252 h^{-5} - 120 h^{-7} + 10 h^{-9}$
10	$1 - 55 h^{-2} + 330 h^{-4} - 462 h^{-6} + 165 h^{-8} - 11 h^{-10}$	$11 h^{-1} - 165 h^{-3} + 462 h^{-5} - 330 h^{-7} + 55 h^{-9} - h^{-11}$

The separation between even and odd powers for the x and y coordinates is even more apparent now. We can make SymPy extract the coefficients if we convert the formulas to polynomials. The all_coeffs() method will also insert zeros for all missing powers between the highest and the lowest:

def extractCoefficients(expr):
    # poly will error out if a constant expression is passed in
    coeffs = [expr] if expr.is_constant() else poly(expr).all_coeffs()
    coeffs.reverse()
    return coeffs

for x in xs:
    print(extractCoefficients(x))

for y in ys:
    print(extractCoefficients(y))

Padding the missing columns with zeros results in the following table of coefficients (the x and y coefficients are merged because the powers don't overlap):

$n$	1	2	3	4	5	6	7	8	9	10	11	12	13
$h^{0}$	1	1	1	1	1	1	1	1	1	1	1	1	1
$h^{-1}$	0	1	2	3	4	5	6	7	8	9	10	11	12
$h^{-2}$	0	0	-1	-3	-6	-10	-15	-21	-28	-36	-45	-55	-66
$h^{-3}$	0	0	0	-1	-4	-10	-20	-35	-56	-84	-120	-165	-220
$h^{-4}$	0	0	0	0	1	5	15	35	70	126	210	330	495
$h^{-5}$	0	0	0	0	0	1	6	21	56	126	252	462	792
$h^{-6}$	0	0	0	0	0	0	-1	-7	-28	-84	-210	-462	-924
$h^{-7}$	0	0	0	0	0	0	0	-1	-8	-36	-120	-330	-792
$h^{-8}$	0	0	0	0	0	0	0	0	1	9	45	165	495
$h^{-9}$	0	0	0	0	0	0	0	0	0	1	10	55	220
$h^{-10}$	0	0	0	0	0	0	0	0	0	0	-1	-11	-66
$h^{-11}$	0	0	0	0	0	0	0	0	0	0	0	-1	-12
$h^{-12}$	0	0	0	0	0	0	0	0	0	0	0	0	1

6. Complex Detour

The table form reveals a rather striking pattern: Sign aside, the coefficients in each row are the diagonals of the famous Pascal's triangle!

$\begin{array}{c} 1 \\ 1 \quad 1 \\ 1 \quad 2 \quad 1 \\ 1 \quad 3 \quad 3 \quad 1 \\ 1 \quad 4 \quad 6 \quad 4 \quad 1 \\ 1 \quad 5 \quad 10 \quad 10 \quad 5 \quad 1 \\ 1 \quad 6 \quad 15 \quad 20 \quad 15 \quad 6 \quad 1 \\ 1 \quad 7 \quad 21 \quad 35 \quad 35 \quad 21 \quad 7 \quad 1 \\ 1 \quad 8 \quad 28 \quad 56 \quad 70 \quad 56 \quad 28 \quad 8 \quad 1 \end{array}$

Indeed, if we sum the x and y coordinates in Table 1, we'll get an expression very similar to $(c + (-s))^n$ , except the signs are subtly wrong - this is particularly obvious when you examine the column-sum of the even rows. For $n = 2$ we get $c^{2} + 2 c s - s^{2}$ , which is similar to $(c - s)^2 = c^2 - 2 c s + s^2$ . The sign in front of $s^2$ is negative, which is only possible if $(-s)^2 = (-1)^2 s^2 = -s^2$ , but this implies that $-1$ squares to $-1$ , which goes against the usual rules of arithmetic.

Wouldn't it be nice if we could extend our rules of algebra to accommodate our repeated rotation formula? Focusing on the problematic $(-1)^n$ term, what if we had a special symbol, call it $i$ , that squares to $-1$ instead? Let's try it:

$\begin{aligned} (c + i s)^2 &= c^2 + 2 c i s + (i c)^2 = c^2 + 2 i c s -c^2 \\ (c + i s)^3 &= c^3 + 3 c^2 i s + 3 c i^2 s^2 + i^2 i c^3 = c^3 + 3 c^2 i s - 3 c s^2 - i c^3 \end{aligned}$

The signs are now correct, and, as an added benefit, grouping the terms containing $i$ separates the polynomials for the x and y coordinates! At this point you might be asking yourself an important question: Why does this work? If you've had any exposure to complex numbers you already know that $i$ is the (in)famous imaginary unit, and the geometric interpretation of complex numbers and their connection to rotations in the plane was first presented in 1799 by Wessel, a Norwegian surveyor, and was later popularised by Gauss in 1831, who discovered it independently. I'd like to give you a more modern interpretation, which relies on ideas that were not fully developed until the 20th century.

Around the middle of the 19th century, several mathematicians began experimenting with modifying or repurposing the rules of algebra to encode and perform operations on more abstract data. The search for higher-dimension-analogues of the complex numbers resulted in new imaginary units being added to algebra, developing the rich theory of hypercomplex numbers. At the same time, George Boole used algebra to represent logical operations, advancing Gottfried Wilhelm Leibniz's ideas of computational reasoning. Hermann Grassmann, a German school teacher, developed an algebraic system called Extension Theory, and used it to represent geometric calculations. Underlying Extension Theory was a theory of forms, which advocated for the separation of the computational system from its potential applications or physical interpretations:

"The essential advantages which were attained through this conception were 1) in relation to the form - now all principles which express views of space are entirely omitted, and consequently the beginnings of the science were as direct as those of arithmetic - and 2) in relation to content - the limitation to three dimensions is omitted." - Grassmann's 1844 "Die lineale Ausdehnungslehre, ein neuer Zweig der Mathematik" (translated to "Extension Theory" in English), as presented in Crowe's "A History of Vector Analysis"

Similar ways of thinking about mathematics eventually culminated in the formalist philosophical movement. My only problem with this movement is that its influence on education has been largely negative. While it's true that strong attachments to a particular interpretation can be limiting, stripping away the intuition and ideas that went into a particular formalism can be just as problematic. This is particularly apparent in physics, where practitioners are given highly abstract mathematical tools that they often end up mistaking for what's happening in the real world. Even experienced mathematicians are often forced to reverse engineer the intuition behind obtusely abstract presentations when they veer outside their realm of expertise.

Various mathematical notations were predecessors of modern programming languages, and none were more dominant than algebra. Can you think of ways to represent data structures and operations on them when all you have are abstract letters and the rules of arithmetic? We can convert our vector data structure to an algebraic expression by introducing abstract letters that represent each component, much like fields in a data structure described in a modern programming language. In honour of Grassmann, I'll call these variables $e_0$ and $e_1$ , and we can complete our data structure by connecting them with the plus sign: $e_0 + e_1$ . The coefficients in front of the variables represent the vector components, so a vector, say $(2, 3)$ , can be represented as $2 e_0 + 3 e_1$ . Adding or subtracting two expressions together corresponds to the usual vector linear combination, but multiplication is where we get to exercise our creativity! We can define various kinds of multiplication operations as long as they allow us to expand our data structure expressions like regular arithmetic multiplication (distributive property). Let's multiply $2 e_0 + 3 e_1$ by $4 e_0 + 5 e_1$ :

$(2 e_0 + 3 e_1)(4 e_0 + 5 e_1) = 2 \cdot 4 e_0 e_0 + 2 \cdot 5 e_0 e_1 + 3 \cdot 4 e_1 e_0 + 3 \cdot 5 e_1 e_1$

If you wanted to compute the dot product of two vectors via multiplication, you could define $e_0 e_0 = e_1 e_1 = 1$ and $e_0 e_1 = e_1 e_0 = 0$ , which allows us to clear out the unwanted values:

$2 \cdot 4 e_0 e_0 + 2 \cdot 5 e_0 e_1 + 3 \cdot 4 e_1 e_0 + 3 \cdot 5 e_1 e_1 = 8 \cdot 1 + 10 \cdot 0 + 12 \cdot 0 + 15 \cdot 1 = 23$

How about swapping the x and y components? Here's one way to do it: Multiply a vector expression by $e_1$ and define $e_0 e_1 = e_1$ and $e_1 e_1 = e_0$ :

$(2 e_0 + 3 e_1) e_1 = 2 e_0 e_1 + 3 e_1 e_1 = 2 e_1 + 3 e_0$

The choice of $e_1$ was arbitrary, and we could've used $e_0$ to accomplish the same result. Remember our quarter-turn rotation? Now that we can swap two components, we can just as easily negate the first one by defining $e_1 e_1 = -e_0$ instead, so quarter-turn rotation can now be expressed as:

$(v_0 e_0 + v_1 e_1) e_1 = v_0 e_0 e_1 + v_1 e_1 e_1 = -v_1 e_0 + v_0 e_1$

Adding a scalar value to a vector expression is meaningless, so we can get rid of $e_0$ and rename $e_1$ to $i$ to arrive at the standard complex numbers. I'll leave it up to you to verify that all our rotation formulas can be expressed in this algebraic form. In the spirit of Grassmann, it's important to note that interpreting the operation as a quarter turn rotation is entirely optional, and there are other contexts where swapping two entries in a list and negating one of them can be used for different purposes, like computing determinants to solve linear systems of equations, or calculating (hyper)volumes of parallelotopes, while keeping track of the measurement units. It even shows up in polynomial division, though that's a subject for a future article.

There's a subset of programmers who enjoy finding unintended ways to perform computations, to the point that it gives programming language designers and security-sensitive developers a lot of headaches. Mathematicians are not that different in this regard! Any time they mention the word "formal", you should be on the lookout for hidden data structures, and using polynomials in this way is particularly common. The imaginary unit was originally introduced as a notational device to keep track of square roots of negative values in case they end up being squared or cancelled out later in the computation. It greatly bothered mathematicians because sophisticated notions of computation were not yet developed, and even negative numbers (integers) were viewed with a great deal of suspicion throughout the ages - every "number" had to be associated with something tangible in the real world. Can you imagine programmers agonising over the use of abstract data types in their physics simulations? I don't know about you, but I don't see any hash maps, bounding volume hierarchies, or grids when I knock over a glass of water. I hope this section has convinced you to look at mathematics in a similar way!

7. Interpolation and Approximation

Now that we have a connection with (complex) exponentiation, we can simply use the Binomial theorem to produce symbolic expressions for the $n$ -th repeated rotation. However, in the spirit of exploration, I wanted to show you what we could've done if we didn't recognise Pascal's triangle. Our strategy will involve interpolation - finding a minimal degree polynomial that produces the appropriate coefficient for a given value of $n$ . This is a daunting task if you're not used to this kind of work, but I'll show you that it's not as difficult as it may seem! Feel free to skip this part if you're not interested.

Generating the coefficients for $h^{0}$ and $h^{-1}$ is easy - they're obviously $1$ and $n - 1$ respectively, where $n$ denotes the rotation count. $h^{-2}$ is where the pattern is not immediately obvious unless you have some experience with integer sequences. The easiest thing to do is look up the coefficients in the Online Encyclopedia of Integer Sequences (OEIS), which is an incredible resource that will save you lots of time if you frequently need to crack the patterns behind integer sequences. I tend to look up sequences in the OEIS all the time, but what if you didn't know about it or couldn't find yours in it? I'll show you a surprisingly versatile technique that can be used to analyse all kinds of sequences.

First, you want to find out how the sequence terms change over different values of $n$ , and the most natural way of doing that is computing the difference between consecutive terms. Those differences will form another sequence, and you repeat the same process with it until the consecutive differences become constant (comprised of the same value, e.g. $3, 3, 3 \ldots$ ), increase linearly (e.g. $1, 2, 3, 4, \ldots$ ), or you recognise the pattern in them. Until you get better intuition about the process, I recommend you keep taking consecutive difference until they're constant.

The next step relies on the observation that the $k$ -th integer of a sequence $S$ can be obtained by summing the first $k$ consecutive differences of the sequence, and adding them to the initial integer. Integers in the sequence of differences can, in turn, be computed as sums of their own consecutive differences - it's nested sums of differences all the way down! The skill of manipulating discrete sums is often neglected outside treatments of discrete mathematics, but it's an essential asset even for disciplines like analysis. I may write about it more in the future but, for now, let's return to the task at hand.

Going back to the coefficients of $h^{-2}$ , the difference between $10$ (for $n = 5$ ) and $6$ (for $n = 4$ ) is $4$ . If you repeat this process for all of the coefficients, you'll find that the differences between them form the following sequence: $1, 2, 3, 4, 5, 6 \dots$ In other words, $f(n) - f(n - 1) = n$ , where $f$ is the mystery function we're trying to find. Rearranging the equation we get $f(n) = n + f(n - 1)$ which makes it a little more obvious that getting to the $n$ -th coefficient is as simple as adding $n$ to the $(n-1)$ value. Starting from the initial value we get $f(1) = 0 + 1$ , $f(2) = f(1) + 2 = 0 + 1 + 2$ , $f(3) = f(2) + 3 = 0 + 1 + 2 + 3$ and our mystery function $f(n)$ appears to be the sum of all integers from 1 to $n$ . If you don't remember the formula for that sum, there's an easy way to derive it.

As usual, let's start simple: $S_4 = 1 + 2 + 3 + 4$ Notice that $1 + 4 = 2 + 3 = 5$ . Moving on, $S_5 = 1 + 2 + 3 + 4 + 5$ and $1 + 5 = 2 + 4 = 3 + 3 = 6$ . This observation suggests a general pattern: $S_n = (1 + n) + (2 + n - 1) + (3 + n - 2) \ldots = (n + 1 ) + (n + 1) + (n + 1) \ldots$ How many of those $(n + 1)$ terms do we have? Since we grouped the numbers in pairs of two we have $n / 2$ terms. Putting it all together:

$S_{n} = \frac{n (n + 1)}{2}$

You can check for a few values of $n$ that we do indeed get the same numbers as the coefficients for $h^{-2}$ . However, there's a minor problem: If we plug in $2$ into the formula we'd expect to get a $1$ , because that's the coefficient of $h^{-2}$ for the second rotation, but what we get back instead is $3$ , which is the coefficient for the third rotation. No problem, we can just shift the function by replacing $n$ with $n - 1$ , resulting in the following formula for the coefficients $C_2$ of $h^{-2}$ :

$C_2 = \frac{n (n - 1)}{2}$

Here's a table of successive differences for $n = 3$ :


$1$	$4$	$10$	$20$	$35$	$56$	$84$	$120$	$165$	$220$
$3$	$6$	$10$	$15$	$21$	$28$	$36$	$45$	$55$
$3$	$4$	$5$	$6$	$7$	$8$	$9$	$10$

At this point you can repeat the same process for the other coefficients (don't forget to apply the appropriate shifting where necessary), though you might already have a pretty good guess about the general pattern. Each time we increase the exponent $e$ (technically decrease in our case, but the sign is irrelevant), we multiply the formula for the previous coefficient by $(n - e + 1)$ and divide by $e$ . For example, the formula for the coefficients $C_3$ of $h^{-3}$ is:

$C_2 = \frac{n (n - 1) (n - 2)}{2 \cdot 3}$

and in general:

$C_{k} = \frac{n (n - 1) (n - 2) \ldots (n - k + 1)}{k!}$

where $n$ represents the rotation count and $k$ is the power of $h$ whose coefficient we're trying to determine. Depending on your maths background, you might recognise this as the binomial coefficients formula, usually denoted $\binom{n}{k}$ , which shows up in Newton's generalised binomial theorem. The approach I presented above is often called "method of successive differences", because of the way you repeatedly compute differences between sequences until your new sequence only contains one number, at which point you begin to work backwards to construct a polynomial that goes through every point in the starting sequence. This method lies at the heart of the Gregory-Newton Forward Difference Interpolation method, and it's how Newton originally arrived at his more general binomial formula. The reason why the formula is considered more general is because, as Newton realised, $n$ doesn't need to be a positive whole number anymore - the expressions for $n$ are just polynomials, and you can plug in negative or even rational numbers, though doing so means the sequence no longer terminates - you get an infinite loop! Curious readers may want to investigate interesting cases like $n = 1/2$ for complex numbers.

Now is a good time to write the two functions that generate the $n$ -th rotation polynomials for x and y so we can check whether we made any mistakes along the way:

Symbolic Rotation Formula

h, n = var('h, n')

def rotateX(rotationCount):
    hs = h * h
    accum = 0
    term = 1
    for i in range(1, rotationCount):
        # The sign has to alternate hence the minus in front
        # We want to generate 2!, 4!, 6!, (2*i)!, which can be split into:
        # (1*2) * (3*4) * (5*6) * ... * [(2*i - 1)*(2*i)] = i*(4*i - 2)
        denom = -hs * i * (4 * i - 2)
        # For the numerator we want the following sequence:
        # [(n - 0)*(n - 1)]*[(n - 2)*(n - 3)]*...*[(n - 2*i + 2)*(n - 2*i + 1)]
        term *= (n - 2*i + 2) * (n - 2*i + 1) / denom
        accum += term
    return accum + 1

def rotateY(rotationCount):
    if rotationCount == 0:
        return 0

    hs = h * h
    accum = n / h
    term = accum
    for i in range(1, rotationCount):
        # The sign has to alternate hence the minus in front
        # We want to generate 1!, 3!, 5!, (2*i - 1)!, which can be split into:
        # (2*3) * (3*4) * (5*6) * ... * [(2*i)*(2*i+1)] = i*(4*i + 2)
        denom = -hs * i * (4 * i + 2)
        # Because we start with n already in the term's numerator, we need to
        # generate the following sequence:
        # [(n - 1)*(n - 2)]*[(n - 3)*(n - 4)]*...*[(n - 2*i + 1)*(n - 2*i)]
        term *= (n - 2*i + 1) * (n - 2*i) / denom
        accum += term
    return accum

def rotate(rotationCount):
    print(f'x after {rotationCount} rotations:')
    x = rotateX(rotationCount)
    print(x.subs(n, rotationCount))

    print(f'y after {rotationCount} rotations:')
    y = rotateY(rotationCount)
    print(y.subs(n, rotationCount))

Calling rotate(6) gives us what we'd expect:

$\begin{aligned} x_6 &= 1 - 15 h^{-2} + 15 h^{-4} - h^{-6} \\ y_6 &= 6 h^{-1} - 20 h^{-3} + 6 h^{-5} \end{aligned}$

For our specific algorithm, $n$ , representing the number of rotations, is not an arbitrary number. Recall that the rotation count was determined by evaluating $d/h$ , which becomes $d h$ after we replace $h$ with its reciprocal. To understand how that's going to affect the final output for large values of $n$ , let's substitute $n = d h$ in our rotation formulas for a small rotation count, like 3, and expand:

expanded = lambda expr : expr.subs(n, var('d') * h).expand()
print(expanded(rotateX(3)))
print(expanded(rotateY(3)))

$\begin{aligned} x_3 &= 1 - \frac{d^{2}}{2} + \frac{d^{4}}{24} + \bigg( - \frac{d^{3}}{4 h} + \frac{11 d^{2}}{24 h^{2}} + \frac{d}{2 h} - \frac{d}{4 h^{3}} \bigg) \\ y_3 &= d - \frac{d^{3}}{6} + \frac{d^{5}}{120} + \bigg( - \frac{d^{4}}{12 h} + \frac{7 d^{3}}{24 h^{2}} + \frac{d^{2}}{2 h} - \frac{5 d^{2}}{12 h^{3}} - \frac{d}{3 h^{2}} + \frac{d}{5 h^{4}} \bigg) \end{aligned}$

I grouped all the terms that have $h$ in the denominator because we're interested in huge values of $h$ , which - combined with the factorial in the denominator - would make the terms in the brackets tiny; I'll call those terms the vanishing group. One crucial observation is that $C_{k}$ is a polynomial of the same degree as the corresponding negative power of $h$ , and the coefficient of the highest power term is guaranteed to be 1. This is why it makes sense to have terms where the $h$ values in the denominator cancel out the ones in the numerator, leaving only a power of $d$ on top. Generating the polynomials for 4 rotations makes our observation a bit more obvious:

$\begin{aligned} x_4 = \; &1 - \frac{d^{2}}{2} + \frac{d^{4}}{24} - \frac{d^{6}}{720} + \bigg( \frac{d^{5}}{48 h} - \frac{17 d^{4}}{144 h^{2}} - \frac{d^{3}}{4 h} + \frac{5 d^{3}}{16 h^{3}} + \frac{11 d^{2}}{24 h^{2}} - \frac{137 d^{2}}{360 h^{4}} + \frac{d}{2 h} - \frac{d}{4 h^{3}} + \frac{d}{6 h^{5}} \bigg) \\ y_4 = \; & d - \frac{d^{3}}{6} + \frac{d^{5}}{120} - \frac{d^{7}}{5040} + \bigg( \frac{d^{6}}{240 h} - \frac{5 d^{5}}{144 h^{2}} - \frac{d^{4}}{12 h} + \frac{7 d^{4}}{48 h^{3}} + \frac{7 d^{3}}{24 h^{2}} - \frac{29 d^{3}}{90 h^{4}} + \frac{d^{2}}{2 h} - \frac{5 d^{2}}{12 h^{3}} + \frac{7 d^{2}}{20 h^{5}} \\ &- \frac{d}{3 h^{2}} + \frac{d}{5 h^{4}} - \frac{d}{7 h^{6}} \bigg) \end{aligned}$

Notice that all the terms from the 3-rotation formulas are still present. What would the general structure of the formulas look like as we add more and more rotations? It seems like the group comprised of even or odd powers of $d$ with the factorial of the power in the denominator will continue to expand. Furthermore, the larger $h$ becomes, the smaller the total sum of the vanishing group will be; and if approximate the expression by getting rid of terms containing $h$ , we'd be left with very pleasant formulas for our x and y coordinates:

$\begin{aligned} x_{n} = 1 - \frac{d^2}{2!} + \frac{d^4}{4!} - \frac{d^6}{6!} + \frac{d^8}{8!} - \ldots \\[0.4em] y_{n} = d - \frac{d^3}{3!} + \frac{d^5}{5!} - \frac{d^7}{7!} + \frac{d^9}{9!} - \ldots \end{aligned}$

Mathematicians have a name for these kinds of formulas: series. Take the time to fully process what we've just done because it can be quite disorienting when you first encounter such a transformation.

We converted the coefficients into generating polynomials in $n$ , and that gave us additional information about the structure of the computation. A beautiful pattern was revealed after substituting for $n$ : Each rotation in the brute-force algorithm decomposed into a polynomial in $d$ and a separate group of terms with $h$ in the denominator. This allowed us to determine that for very large values of $h$ - and therefore potentially billions of rotations - the polynomial we'd get in the end would have the same general structure, and so we could simplify the computation by eliminating the part of it that contributes very little to the final result - the vanishing group.

However, we still have a problem: How do we know when to stop? Surely if we continue carelessly adding terms we run the risk of overshooting the target location on the circle! Morever, how do we even know that the final result will actually match - or be close enough to - what the brute-force algorithm would give us?

These are not trivial questions, and over several centuries, mathematicians have developed a lot of tools to help answer them. If you were studying these series in isolation, you'd have to look for insights within a field called analysis. But we might be able to take advantage of the context in which we discovered them to help guide us towards more intuitive answers.

Let's go back to the premise behind the brute-force algorithm: The larger we make $h$ , the more rotations we need to perform, and the closer we'll get to the target location. Performing more rotations would result in generating more and more terms in the series, while the vanishing group would continue to shrink. This directly implies that the two series will not only converge on finite values, but those values will close in on the location that we're looking for! While this is not considered a formal proof, I find this kind of intuitive reasoning remarkably beautiful.

To wrap up this section, let's translate the series to Python functions:

def cosB(x, iters = 11):
    xs = x*x
    accum = 1
    term = 1
    for i in range(1, iters+1):
        term *= -xs/(i*(4*i - 2))
        accum += term
    return accum

def sinB(x, iters = 11):
    xs = x*x
    accum = x
    term = x
    for i in range(1, iters+1):
        term *= -xs/(i*(4*i + 2))
        accum += term
    return accum

Evaluating these functions with $d = 2$ gives us a similar result to our brute force method:

Brute-force:
d=2, h=0.00001, rotations=200000
x=-0.41614683648618206
y=0.9092974268526915

Series:
x=-0.41614683654714246
y=0.9092974268256817

The output values match up to the 9th and 13th decimal respectively (the series x is actually more accurate), yet we only ran a total of 22 iterations compared to the 200000 for the brute-force method!

You might have noticed that I picked rather peculiar names for the functions instead of calling them, say, coordinateX and coordinateY. After all, we're computing the coordinates we'll arrive at if we travelled $x$ distance units along the circle. Unfortunately, mathematics terminology is stuck with these mistranslations of Arabic texts so you'll typically see the functions computing the x and y coordinates being called cosine (or $\cos$ ) and sine (also $\sin$ ) respectively. I decided to use those names as well, even though I dislike them, because they're already in the title of this article and will likely be more familiar to you.

Congratulations if you made it this far! The hardest part is over, but there are a few more optimizations we can leverage.

8. Range Reduction

Suppose you travelled 0.5 turns along the circle, took a short break, and then decided to continue moving for another full turn. Unless you're trying to exercise, this is clearly pointless because you'll end up in the same spot after travelling a total of 1.5 turns from the starting point. In fact, any whole number multiple of one turn will bring you back to the same location. If I asked you to compute the location where you'll end up after travelling 42.5 turns, you don't actually need to plug that value directly into our cosine and sine formulas - assuming you knew how to convert turn units into distances - because you can simply subtract 42 and plug in 0.5 instead.

This simple observation allows us to reduce any arbitrary distance to the range 0 to 1 turns, which is highly desirable from an algorithmic standpoint since it puts a fixed bound on the number of computations we need to perform (given some desired degree of accuracy). However, the pesky turn-to-distance conversion problem is blocking our progress once again! Let's figure out how to measure the length of the unit circle's boundary (its circumference) and put the conversion beast to rest once and for all!

8.1 Unit Circle Circumference

As you might suspect by now, any time I'm faced with a new problem I always attempt to solve it in the simplest way I can think of because that gives me a baseline for comparisons with potential future solutions - it also provides me with initial insight into the structure of the problem, which, as you learned in the previous sections, can inspire clever optimizations. Using the tools we've already developed, what would be your initial attempt at measuring the circumference of a unit circle?

My first thought was to simplify the measurement by relying on the fact that the full length is comprised of 4 equal pieces, belonging to each quadrant (quarter turn slice of the circle), and so computing the circumference can be done by measuring the length of the counter-clockwise arc from $(1, 0)$ to $(0, 1)$ and multiplying the resulting value by 4.

Next, I knew I could rotate in tiny increments until the x coordinate becomes negative, which would indicate that we crossed the $(0, 1)$ vector. Recording the number of rotations we performed before crossing the vertical line and multiplying that value by the chosen $h$ height of the rotated triangle will give us an approximation for the length of the quadrant arc. Alternatively, we could use our cosine formula and slowly increment the distance input until the computed value becomes negative. The latter approach seemed more convenient so I went with it:

x = 0 # starting value and also the upper bound at the end
increment = 0.1 # increment x by this value on each iteration
xprev = x # store the previous value to get a lower bound

while True:
    x += increment
    if cosB(x) < 0:
        break
    xprev = x

print(f'Lower bound: {xprev}\nUpper bound: {x}')

Lower bound: 1.5000000000000002
Upper bound: 1.6000000000000003

I started with a coarse approximation to get a general sense for the interval in which the true values resides. Once I knew it's somewhere between 1.5 and 1.6, I changed x to 1.5, the increment to 0.0000001, and ran the same piece of code again:

Lower bound: 1.5707963000413356
Upper bound: 1.5707964000413357

You could continue substituting the lower bound for x and decreasing the increment to get better and better approximations but that felt too clunky and inelegant so I decided to change my approach. There are two main problems with our current implementation: The increment is fixed so we need to manually adjust it, and we're also terminating the computation as soon as the cosine becomes negative. Can you think of a way to modify the algorithm so the increment is decreased automatically and the negative value check is eliminated?

If you think about the behaviour of cosine around the $(0, 1)$ point, the closer we get to it, the smaller the cosine value becomes. This means that we could use the computed cosines as the increments for each step! Furthermore, as soon as the cosine becomes negative, its value will automatically decrease the approximated distance, and we could use this self-correcting behaviour to get rid of the if statement that terminates the computation. Implementing these adjustments gives us the following quadrant arc length algorithm:

def quadrantArcLength(iters = 2):
    x = 1.57
    for _ in range(0, iters):
        x = x + cosB(x)
    return x

pi = quadrantArcLength()*2
circumference = pi*2
print(f'Pi: {pi}\nCircumference: {circumference}')

Pi: 3.141592653589793
Circumference: 6.283185307179586

Imagine yourself picking up a circle, cutting the boundary curve somewhere, and straightening it out into a line. The length of that line is the circumference (often denoted $\tau$ and pronounced ta-oo) and is roughly 6.28 times longer than the radius (which is 1 for a unit circle). Dividing the circumference by 2 will give us the famous mathematical constant $\pi = 3.141592653589793...$ For the sake of convenience, I'll start expressing circular distances in terms of $\pi$ : The length of a quadrant arc then becomes $\pi/2$ , half a turn would be $\pi$ , the circumference is $2\pi$ , and so on. As a proponent of the $\tau$ notation this wasn't an easy decision, but it is the more common approach.

Knowing the circumference also helps answer the turn value question: One turn is equal to ~6.28 distance units (also called radians) and with the help of our handy cosine and sine functions, we can map turn units to coordinates on the circle by first converting the turn value to its radians representation! Over the years, people have come up with various conventions where the distance unit range $[0, 6.283185307179586]$ is mapped to more convenient values such as $[0, 360]$ (degrees) and the $[0, 1]$ turns we're already familiar with.

If you're curious about the more general principle behind the quadrant arc length algorithm, you can look up the term fixed-point iteration. Briefly, the idea is to leverage structural properties of a given function to iteratively close in on a desired value. The Babylonians, for example, used a similar approach to derive an algorithm for computing square roots.

We now have a good approximation for the circumference of the unit circle so we can divide any given distance value by it to find how many turns around the circle we'll make, and then subtract the distance travelled in those turns from the input to get a range reduced value for our computations:

def rangeReduce(x):
    # In Python you could also do x % circ to compute the same value
    # but I'm being explicit here for the sake of clarity
    k = math.floor(x / circ)
    x = x - k * circ
    return x

Let's augment our base cosine and sine functions:

def cos1(x, iters = 11):
    x = rangeReduce(x) # same as x = x % circ
    return cosB(x, iters)

def sin1(x, iters = 11):
    x = rangeReduce(x) # same as x = x % circ
    return sinB(x, iters)

So far, we've managed to reduce our input range from $[0, \infty)$ to $[0, 2\pi]$ , but we're not done yet!

8.2 Quadrants

Having computed the length of a single quadrant, we have sufficient information to determine which quadrant we'll end up in if we computed the cosine and sine values for a given input. For example, if the input distance (after the initial range reduction) is larger than $\pi/2$ we know it won't be in the first quadrant; if it's larger than $\pi$ , it won't be in the first or second quadrants. We can binary search for the correct quadrant and find it in just two checks, but how does that help us?

If you pick some vector in the first quadrant with coordinates $(\cos(x), \sin(x))$ , you can rotate it around the circle using the familiar swapping of coordinates and negating the x value. Each rotation corresponds to a $\pi/2$ increase in the distance $x$ , which means that, for example, $\cos(x+\pi/2) = -\sin(x)$ because that's the x coordinate in the 2nd quadrant. If we compute all the adjusted values we get the following table:

Quadrant	Coordinate	Adjusted
1	$(\cos(x), \sin(x))$	$(\cos(x), \sin(x))$
2	$(-\sin(x), \cos(x))$	$(\cos(x + \pi/2), \sin(x+\pi/2))$
3	$(-\cos(x), -\sin(x))$	$(\cos(x + \pi), \sin(x+\pi))$
4	$(\sin(x), -\cos(x))$	$(\cos(x + 3\pi/2), \sin(x+3\pi/2))$

Now, suppose you reversed this process and started with an unknown vector whose associated distance value places it in the 3rd quadrant. Instead of computing its location directly using the cosine and sine formulas, you could subtract $\pi$ from the distance, calculate the cosine and sine in the $[0, \pi/2]$ range and then substitute the computed values in the coordinate column for the 3rd quadrant in the table above - in this case, all you need to do is negate them:

def cos2(x, iters = 11):
    x = x % (2*pi)

    if x < pi:
        if x < pi/2:
            return cosB(x, iters)
        return -sinB(x - pi/2, iters)

    if x < pi*3/2:
        return -cosB(x - pi, iters)
    return sinB(x - pi*3/2, iters)

def sin2(x, iters = 11):
    x = x % (2*pi)

    if x < pi:
        if x < pi/2:
            return sinB(x, iters)
        return cosB(x - pi/2, iters)

    if x < pi*3/2:
        return -sinB(x - pi, iters)
    return -cosB(x - pi*3/2, iters)

8.3 Mirror Symmetry

We're almost done with the range reduction optimizations! There's just one more structural property we haven't exploited yet: mirror symmetry (reflection). Let's take a vector $(x, y)$ as an example. Switching the $x$ and the $y$ will give us $(y, x)$ , which is a reflection around the $y=x$ axis. If we set $x = \cos(x)$ and $y= \sin(x)$ we get the following picture:

Note how the line $y=x$ splits the arc bounded by $0$ and $\pi/2$ in half. Maybe we can somehow leverage the symmetry around the half-way distance, $\pi/4$ , to shrink the $[0, \pi/2]$ range to $[0, \pi/4]$ !

More concretely, suppose you were interested in calculating the cosine and sine values for a circular distance value $x$ between $\pi/4$ and $\pi/2$ . Swapping the cosine and sine values will give us the reflected vector that lies somewhere between $0$ and $\pi/4$ . How do we compute the distance value - let's call it $r$ - that corresponds to the reflected vector? If we knew how to calculate $r$ , we could compute $\cos(x)=\sin(r)$ and $\sin(x)=\cos(r)$ !

The key observations that will help us with this problem are that $(1, 0)$ gets reflected into $(0, 1)$ and the fact that reflections preserve distances, so the circular distance between the horizontal base $(1, 0)$ and $(\cos(r), \sin(r))$ is the same as the circular distance between the vertical base $(0, 1)$ and $(\cos(x), \sin(x))$ . This means that if we subtract the value of $x$ from $\pi/2$ , we'll find $r$ ! Putting everything together we arrive at a very handy relation between cosines and sines:

$\begin{aligned} \cos(x) &= \sin\bigg(\frac{\pi}{2} - x\bigg) \\ \sin(x) &= \cos\bigg(\frac{\pi}{2} - x\bigg) \end{aligned}$

To make use of this relation in our cosine and sine code, we could check whether the range reduced $x$ is larger than $\pi/4$ and subtract it from $\pi/2$ if it is - in addition to swapping cosines for sines and vice versa. However, if we apply this range reduction directly, we'll end up with very unpleasant code because we're already alternating between cosB and sinB statements all over the place. Let's clean up the code a bit in preperation for the final range reduction step by rewriting cos2 so it only calls cosB and get rid of all calls to cosB in sin2. We can use the formulas we've just discovered to perform this transformation. For example, we can convert $\mathrm{sinB}(x - \pi/2)$ to $\mathrm{cosB}$ by negating the input value and adding $\pi/2$ to it:

$\begin{aligned} \mathrm{sinB}\bigg(x - \frac{\pi}{2}\bigg) &= \mathrm{cosB}\bigg(-\bigg(x - \frac{\pi}{2}\bigg) + \frac{\pi}{2}\bigg)\\ &= \mathrm{cosB}(\pi - x) \end{aligned}$

This would allow us to make the code a lot more uniform and we can apply the $[0, \pi/4]$ range reduction as the final step:

def cos3(x, iters = 11):
    # [0, 2*pi] range reduction
    x = x % (2*pi)
    sign = 1
    # [0, pi/2] range reduction
    if x < pi:
        if x < pi/2:
            pass # leaving this for clarity
        else:
            x = pi - x
            sign *= -1
    else:
        if x < pi*3/2:
            x = x - pi
            sign *= -1
        else:
            x = 2*pi - x
    # [0, pi/4] range reduction
    c = cosB(x, iters) if x < pi/4 else sinB(pi/2 - x, iters)
    return c*sign

def sin3(x, iters = 11):
    # [0, 2*pi] range reduction
    x = x % (2*pi)
    sign = 1
    # [0, pi/2] range reduction
    if x < pi:
        if x < pi/2:
            pass
        else:
            x = pi - x
    else:
        sign *= -1
        if x < pi*3/2:
            x = x - pi
        else:
            x = 2*pi - x
    # [0, pi/4] range reduction
    s = sinB(x, iters) if x < pi/4 else cosB(pi/2 - x, iters)
    return s*sign

9. Negative Angles

I've been ignoring negative distances so far to keep the incremental development of the algorithms more focused. Fortunately, addressing this omission is fairly straightforward!

First, what does a negative distance mean and does it even make sense to consider it? After all, we could always restrict the inputs to our cosine and sine functions to positive values only. Intuitively, I think of negative distances as rotations in the opposite direction - clockwise instead of the counterclockwise direction we've been working with so far. However, just because that makes sense to me intuitively doesn't mean it will fit the analysis we've done so far, or that our algorithms will produce a meaningful result when a negative value is fed into them.

In the previous section we actually ran into a disguised negative input: $\cos(\pi/2 - x)$ , $\sin(\pi/2 - x)$ Because our rotations compose, we can interpret those expressions as rotating the vector located a circular distance of $\pi/2$ units from $(1, 0)$ - which is $(0, 1)$ - by the vector $(\cos(-x), \sin(-x))$ . Moreover, we already know what the final result is: $(\sin(x), \cos(x))$ . If we're going with the idea that negative distances are rotations in the opposite direction, we can cancel out the $\pi/2$ value in $\pi/2 - x$ by performing a $\pi/2$ clockwise rotation whose formula (I'll call it $\mathrm{rotQc}$ ) can be established with the same thought process we used all the way back in the beginning:

$\mathrm{rotQc}((a, b)) = (b, -a)$

We're still exchanging the coordinate values but this time it's the second one that gets negated. Applying $\mathrm{rotQc}$ to our vector of interest yields the following:

$\begin{aligned} (\cos(-x), \sin(-x)) &= \mathrm{rotQc}((\cos(\pi/2 - x), \sin(\pi/2 - x)) = \mathrm{rotQc}((\sin(x)), \cos(x)) \\ &= (\cos(x), -\sin(x)) \end{aligned}$

This is just a vertical reflection of the original vector so it fits well with our starting idea of $h$ as the height of a right triangle. In fact, simply allowing $h$ to be negative in the repeated rotation calculations already gives us a good idea about the behaviour of a negative input:

The rotation direction is reversed as expected! A negative $h$ would also cancel out the negative sign in the circular distance input of our brute-force cosine and sine algorithm when computing the rotation count. Plugging in a negative circular distance value in the optimized formulas is equally pleasant: The even powers in the cosine will cancel the negative sign while the odd powers in the sine will flip the sign of the final result, which matches the results we obtained from leveraging the compositional property of 2D rotations! What this means is that we can treat negative inputs as positive ones as long as we remember to negate the sine value if the input was negative:

def cos4(x, iters = 11):
    return cos3(abs(x))

def sin4(x, iters = 11):
    if x < 0:
        return -sin3(-x)
    return sin3(x)

9.1 Cosine and Sine Composition

Now is a good time to pause and examine an implicit assumption we've been making throughout the last few sections: The validity of replacing multiples of $\pi/2$ radians with quarter turn rotations. Suppose we wanted to evaluate $\cos(\pi/2 + x)$ and $\sin(\pi/2 + x)$ . Does it make sense to pull the $\pi/2$ out of the function brackets, compute $(\cos(x)$ , $\sin(x))$ , and apply a quarter turn rotation to this vector to produce the final result? In other words, is the following a valid transformation:

$(\cos(\pi/2 + x), \sin(\pi/2 + x)) = \mathrm{rotQ}((\cos(x), \sin(x))) = (-\sin(x), \cos(x))$

Recall our prior discussion of rotation composition in the beginning of this article: We can use vectors to represent rotations and then compose those vectors to produce a new vector that encodes the sum of the input rotations. The vector that encodes the $\pi/2$ rotation is $(0, 1)$ , and the vector corresponding to a rotation by $x$ is $(\cos(x), \sin(x))$ . If we composed these two vectors with the arbitrary rotation formula we'll get an output vector that represents a rotation by $\pi/2$ followed by a rotation by $x$ (or the other way around), which is equivalent to evaluating the cosine and sine functions with $\pi/2 + x$ as input:

$(0 \cdot \cos(x) - 1 \cdot \sin(x), 1 \cdot \cos(x) + 0 \cdot \sin(x)) = (-\sin(x), \cos(x))$

This is precisely what we'd expect! There's actually nothing special about $\pi/2$ here, we can just as easily replace it with an arbitrary value $y$ and compute the composition of $(\cos(x), \sin(x))$ and $(\cos(y), \sin(y))$ :

$(\cos(x+y), \sin(x+y)) = (\cos(x)\cos(y) - \sin(x)\sin(y), \cos(x)\sin(y) + \sin(x)\cos(y))$

Note that this is the more general form of the double angle formula - substitute $y=x$ to check - and you can take the output vector and rotate it by $(\cos(z), \sin(z))$ to produce the general triple angle formula, repeating this process for as long as you want. The algebraic nature of trigonometry is quite beautiful once you start seeing the patterns in it!

10. Final Algorithm

It's been a long journey but we're finally ready to wrap things up! Putting everything together and eliminating some redundant bits results in the following code:

def cosB(x, iters = 11):
    xs = x*x
    accum = 1
    term = 1
    for i in range(1, iters+1):
        term *= -xs/(i*(4*i - 2))
        accum += term
    return accum

def sinB(x, iters = 11):
    xs = x*x
    accum = x
    term = x
    for i in range(1, iters+1):
        term *= -xs/(i*(4*i + 2))
        accum += term
    return accum

def quadrantArcLength(iters = 2):
    x = 1.57
    for _ in range(0, iters):
        x = x + cosB(x)
    return x

pi = quadrantArcLength() * 2

def cos(x, iters = 11):
    # [0, 2*pi] range reduction
    x = abs(x) % (2*pi)
    sign = 1
    # [0, pi/2] range reduction
    if x < pi:
        if x > pi/2:
            x = pi - x
            sign = -1
    elif x < pi*3/2:
        x = x - pi
        sign = -1
    else:
        x = 2*pi - x
    # [0, pi/4] range reduction
    c = cosB(x) if x < pi/4 else sinB(pi/2 - x)
    return c*sign

def sin(x, iters = 11):
    sign = 1
    if x < 0:
        sign = -1
        x = -x
    # [0, 2*pi] range reduction   
    x = x % (2*pi)
    # [0, pi/2] range reduction
    if x < pi:
        if x > pi/2:
            x = pi - x
    else:
        sign *= -1
        if x < pi*3/2:
            x = x - pi
        else:
            x = 2*pi - x
    # [0, pi/4] range reduction
    s = sinB(x) if x < pi/4 else cosB(pi/2 - x)
    return s*sign

10.1 Further Improvements

That was a fair bit of work but how close did we get to the actual implementations you'd find in calculator software or various maths libraries? Surprisingly, while our algorithm will be reasonably accurate for small distance values, the first step in the range reduction will become the source of increasing error for larger inputs. The limited precision of our $\pi$ calculation is the primary culprit, but addressing this limitation would require us to either use an arbitrary precision maths library - which would be simple but slow - or employ more clever - and complex - techniques. This is a common theme in numerical computing: Obtaining accurate results can be relatively easy, but doing that quickly and efficiently is where all the difficulty lies! If you're curious, you can look up the 1983 Payne-Hanek algorithm which is often used as a reliable fallback whenever faster techniques can't handle a particular input value. A variety of improvements on this idea have been made over the years, such as adding fast paths and multi-precision floating point evaluation.

While all this may sound somewhat discouraging, our $[0, 2\pi]$ range reduction is actually good enough for a lot of applications (especially in the real-time domain). The following table shows the absolute error in our implementation - relative to Python's cosine and sine functions - for two fairly large inputs:

$x$	$\cos(x)$	$\sin(x)$
$2^{32}-1$	$8.31^{-8}$	$1.45^{-7}$
$2^{53}-1$	$0.07$	$0.34$

Inputs as large as $2^{53}-1$ have already lost all floating point precision so the large error values aren't particularly significant. $2^{32}-1$ is a far more reasonable input for double precision floats and the error values for it are small enough to be acceptable for most of the software I've worked on.

We could further improve the performance of our algorithm by determining the lowest degree polynomial that would give us accurate results in the $[0, \pi/4]$ range for a desired precision. You can employ the Remez algorithm to find these polynomials, or use a framework like Sollya that will also take care of various floating point pitfalls for you.

Lastly, there's the concept of correctly rounded cosine and sine computations which aim to determine the floating point value closest to the "exact" result. Due to their negative impact on performance and code size, correctly rounded algorithms are typically not used unless calculations need to be fully deterministic across different platforms as part of a specific standard.

I might explore some of these topics in future articles because they're too complex for an introductory presentation. However, you should now have a good sense for how cosine and sine functions are implemented in the real world!

11. Conclusion

Using the tools we developed you can easily derive other famous trigonometry formulas because they're mostly triangle identities in disguise. After all, the word "trigonometry" can be translated to "triangle measurement" from its Greek origin. All you need to do is construct a unit circle on one of the triangle's corners and exploit the fact the circle will intersect the sides in segments with a length of 1 - the rest is just clever use of similarity ratios.

The goal of this write-up was to convince you that all the cryptic trigonometry relations you learned about in school are primarily algebraic in nature and naturally flow out of a small and simple set of ideas. One does not need to have any concept of the cosine and sine functions to do trigonometry, though they are still useful in applied mathematics. I deliberately used $c$ and $s$ instead to avoid needless confusion and the mathematician Norman Wildberger went a step further and demonstrated that you can do trigonometry just fine with rational numbers alone.

I believe the subject would be a lot easier to grasp if schools taught vector algebra earlier and used it as a foundation for developing trigonometry (my teachers developed it on top of classical Euclidean geometry instead). Computer algebra systems would greatly alleviate a lot of the tedium inherent in the exploration of the algebraic structure of rotations, while simultaneously exposing students to the immensely practical field of programming.

Looking back on my maths education, I recall many times when I felt lost and confused. I had a rather complicated relationship with trigonometry in particular because I learned how to apply and manipulate the formulas, yet, regardless of how much "proficiency" I acquired, I could never shake off a certain nagging voice in my head: "You don't really understand what you're doing, you've just learned how to play the trigonometry game!"

I had high hopes that my university maths classes would help answer all the accumulated questions, yet the professor simply assumed that we already understand the subject and only gave us a brief refresher on the opaque formulas I had memorized in my teenage years. This article is one of the maths lessons I wish I could give to my younger self and I hope it provided some value to you too!