Notes on Learning Rust
I’ve recently been learning Rust and I just finished finished The Rust Programming Language. While I went through this I supplemented the many questions I had with other resources like The Rustonomicon and various Google searches. It’s mostly these detours I want to write about.
This post is going to be more questions than answers since for a lot of these things I genuinely don’t know, or my research just led me to more questions.
Rust is not C
An example type I’ll be using a bit is ZOT
1, which I’m using as a marginally
more complicated Optional
.
enum ZOT<T> { Zero, One(T), Two(T, T) }
The name stands for Zero One Two.
Let’s say I have a function that removes the rightmost element unless it’s
Zero
.
fn remove_last<T>(zot: &mut ZOT<T>) {
match zot {
ZOT::Zero => (),
ZOT::One(_) => { *zot = ZOT::Zero; },
ZOT::Two(x, _) => { *zot = ZOT::One(x); }
}
}
Rust doesn’t love this code because I use the ZOT::One
constructor with a
borrowed object, and I really have no right to do so. That’s fair enough, but
from my perspective it still looks correct. If I’m overwriting whatever’s stored
at the zot
, surely zot
no longer owns x
?
If I replace the last case with panic!
, everything works.
fn main() {
let mut zot = ZOT::One("x");
println!("{:?}", zot); // One("x")
remove_last(&mut zot);
println!("{:?}", zot); // Zero
}
If I compile this, the *zot = ZOT::Zero
line looks like,
lea rsi, [rsp + 16]
mov edx, 40
call memcpy@PLT
jmp .LBB74_2
which makes sense. With optimizations off it seems like we’re constructing
ZOT::Zero
on the stack and then we need to copy it over.
There’s nothing to indicate "x"
is being dropped, but dropping it wouldn’t do
anything anyway since it’s a string literal. What if we change that to
"x".to_string()
?.
mov qword ptr [rsp + 16], 0
mov rax, qword ptr [rip + core::ptr::drop_in_place>@GOTPCREL]
call rax
jmp .LBB128_7
.LBB128_7:
mov rcx, qword ptr [rsp + 64]
mov qword ptr [rax + 48], rcx
movups xmm0, xmmword ptr [rsp + 16]
movups xmm1, xmmword ptr [rsp + 32]
movups xmm2, xmmword ptr [rsp + 48]
movups xmmword ptr [rax + 32], xmm2
movups xmmword ptr [rax + 16], xmm1
movups xmmword ptr [rax], xmm0
So, when we overwrite the value of an immutable reference, we drop what’s there?
That makes sense. I don’t know why memcpy
was replaced with movups
though.
The difficulty here is indicating to me this is the wrong pattern, but it turns out there’s a function that does what I want.
ZOT::Two(_, _) => {
if let ZOT::Two(x, y) = std::mem::replace(zot, ZOT::Zero) {
*zot = ZOT::One(x);
}
},
This works, but it’s incredibly awkward. It’s easy to imagine a type where every
enumeration contained generic parameters, and replace
would no longer be
helpful.
I don’t know if there’s a better way to do this than replace
, but the sense I
get from the difficulty in doing this this way, is that this is a bad pattern in
Rust, and the Rust language designers would much rather have less mutability.
Rust is not Haskell
One of the reasons I think Haskell gets a reputation for being tough is it makes it very difficult to do things the non-haskell way. If you want to write procedural code that works like a while-statement you can use a state monad. But if you’re at the point where you know what a state monad is, you know enough to be wary about using procedural structures.
That sounds like a similar lesson to what I just learned with Rust, but Rust has traits and traits are very similar to typeclasses, so now I’m going to try and treat Rust traits like typeclasses.
Here’s our good friend ZOT
.
data ZOT a = Zero | One a | Two a a
Since it’s a list-like structure we can define Foldable
for it.
instance Foldable ZOT where
foldr f z Zero = z
foldr f z (One x) = f x z
foldr f z (Two x y) = f x (f y z)
Foldable isn’t the same as Iterable
, and there seem to be attempts to bring
Foldable
into rust because googling around lead to some Foldable
traits.
Instead, I’m going to focus on functors, since it’s a bit simpler and is far more fundamental for a language to have if it wants to be Haskell.
instance Functor ZOT where
fmap f Zero = Zero
fmap f (One x) = One (f x)
fmap f (Two x y) = Two (f x) (f y)
We can take inspiration from Option
, which has this map
function defined.
#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_const_unstable(feature = "const_option_ext", issue = "91930")]
pub const fn map<U, F>(self, f: F) -> Option<U>
where
F: ~const FnOnce(T) -> U,
F: ~const Destruct,
{
match self {
Some(x) => Some(f(x)),
None => None,
}
}
A few things stand out to me. First,this function is unstable and I don’t quite
understand why even after looking at the related issue. It’s related to ~const
which I don’t understand and links to this write-up about
~const
which I understand even less.
Anyway, here’s my attempt at repurposing to ZOT.
pub const fn map<U, F>(self, f: F) -> ZOT<U>
where
F: ~const Fn(T) -> U,
{
match self {
ZOT::Zero => ZOT::Zero,
ZOT::One(x) => ZOT::One(f(x)),
ZOT::Two(x,y) => ZOT::Two(f(x), f(y)),
}
}
Rust doesn’t like this because ~const
is experimental and it doesn’t like this
code without ~const
because something should be ~const
.
This doesn’t spell good news for creating a trait and the straightforward implementation doesn’t work.
trait Functor {
fn map<A,B>(&self, f: |&A| -> B) -> Self<B>;
}
It says, “type arguments are not allowed on self type”. Why? I’m not sure. There’s actually a lot of discussion online about why there’s not a Functor trait, along with attempts to create one. But my conclusion here is that it’s not really possible to create one, without some sacrifices. There’s even more discussion with respect to monads.
Just like I learned Rust isn’t C, Rust also isn’t Haskell. Specifically, because in this case because Rust doesn’t have type kinds. I’m still understanding why that’s the case, and I’m sure if it’s related to the ability to output code. Haskell has spoiled (ruined?) my thinking about types in that I’m comfortable thinking in terms of kinds, but now a know I’ll need to learn a different way of doing this.
Closures and Dynamic Dispatch
The last thing that I kept wondering throughout the book was how closures and dynamic dispatch works, and answer I found is through implicit structs and function pointers.
An example of something I used a lot while reading the book was
unwrap_or_else
. Initially I suspected functions that took a function were
always inlined or had to follow a specific set of rules. I also noticed that
bewildering ~const
.
#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_const_unstable(feature = "const_option_ext", issue = "91930")]
pub const fn unwrap_or_else<F>(self, f: F) -> T
where
F: ~const FnOnce() -> T,
F: ~const Destruct,
{
match self {
Some(x) => x,
None => f(),
}
}
There’s a StackOverflow answer that does a good job explaining
things. You put all the enclosed variables in a struct, and give that struct and
the function to the calling function. The calling function thens to pass the
struct along with the function. This feels similar to me in how C functions that
take function pointers, often the function pointer will take an extra void *
parameter that’s also passed to the function.
I can create a small test by looking at the assembly of a function that returns a closure.
fn returns_closure() -> Box<dyn Fn(i32) -> i32> {
let y = 100;
Box::new(move |x| x + y)
}
I’ll note here Rust generates some really verbose assembly if you leave
optimizations turned off. For this example I intentionally set opt-level=3
,
which saved roughly half the instructions.
example::returns_closure:
push rax
mov edi, 4
mov esi, 4
call qword ptr [rip + __rust_alloc@GOTPCREL]
test rax, rax
je .LBB6_1
mov dword ptr [rax], 100
lea rdx, [rip + .L__unnamed_1]
pop rcx
ret
.LBB6_1:
mov edi, 4
mov esi, 4
call qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL]
ud2
.L__unnamed_1:
.quad core::ptr::drop_in_place
.asciz "\004\000\000\000\000\000\000\000\004\000\000\000\000\000\000"
.quad core::ops::function::FnOnce::call_once{{vtable.shim}}
.quad example::returns_closure::{{closure}}
.quad example::returns_closure::{{closure}}
There’s a bit of a difference in that looks like it needs more than just the function pointer (for instance, how to drop), but close enough.
But, unlike C, I can do things like |x| move |y| x + y
, which is a curried
function. Anything similar in C wouldn’t be nearly as terse.
My questions about closure were similar to my questions about dynamic dispatch,
and the answer is vtables, just like in the assembly above. It seems unless you
explicitly have dyn
somewhere in your code you can assume vtables won’t be
used, and expect monomorphization instead, requiring that compiler can compile
the exact function that will be used.
That last part begs another question: how does this work with shared libraries? For instance if I’m packaging ripgrep, and ripgrep is one of many packages that use the regex crate, I’ll want to package regex as a shared library and make that a ripgrep dependency. And if regex (for sake of discussion) provides a lot of generic trait implementations or functions, does that mean my packaged regex needs to include every possible monomorphized function from all combinations of types and functions it provides?
There doesn’t seem to be a practicle answer since neither debian nor arch appear
to attempt to split up ripgrep into its dependencies. Considering arch goes to
great lengths to split Haskell packages up (xmonad
is dependent on
haskell-x11
, haskell-setlocale
and haskell-data-default-class
) I’d assume
they’d be doing the same for Rust if possible 2.
Closing Comments
When I started writing this I had a bunch of things I wanted to note, but as I wrote more questions came up and I kept thinking, how do I demonstrate that I actually understand this? and spent longer on some things and skipped a lot. This article is purely a byproduct of my learning, hence the unanswered questions and errors that I’m sure I’ve made. I’m a long way from learning Rust and a project in mind to actually do something with it.
Although Rust isn’t C or Haskell I can see the mix of influences and am interested by the recent addition of GATs into the language, while far away from encountering the situations where I’ll need them.
-
This isn’t original, see https://crates.io/crates/zot ↩︎
-
Arch’s packages guidelines state, “While some Rust projects have external dependencies, most just use Rust ecosystem libraries that are statically embedded in the final binary”. ↩︎