As people who know me know, my current favourite language is go. One of the best features of go is the lack of features. This is actually the reason I preferred C over most scripting languages for a long time – it does not overburden you with language-features that you first have to wrap your head around. You don’t have to think for a while about what classes or modules or whatever you want to have, you just write your code down and the (more or less) entire language can easily fit inside your head. One of the best writeups of this (contrasting it with python) was done by Gustavo Niemeyer in a blogpost a few years back.
So when I say, there are a few things popping up I miss about go, this does not mean I wish them to be included. I subjectively miss them and it would definitely make me happy, if they existed. But I still very much like the go devs for prioritizing simplicity over making me happy.
So let’s dig in.
Generics
So let’s get this elephant out of the room first. I think this is the most named feature lacking from go. They are asked so often, they have their own entry in the go FAQ. The usual answers are anything from “maybe they will get in” to “I don’t understand why people want generics, go has generic programming using interfaces”. To illustrate one shortcoming of the (current) interface approach, consider writing a (simple) graph-algorithm:
type Graph [][]int
func DFS(g Graph, start int, visitor func(int)) {
visited := make([]bool, len(g))
var dfs func(int)
dfs = func(i int) {
if visited[i] {
return
}
visitor(i)
visited[i] = true
for _, j := range g[i] {
dfs(j)
}
}
dfs(start)
}
This uses an adjacency list to represent the graph and does a recursive depth-first-search on it. Now imagine, you want to implement this algorithm generically (given, a DFS is not really hard enough to justify this, but you could just as easily have a more complex algorithm). This could be done like this:
type Node interface{}
type Graph interface {
Neighbors(Node) []Node
}
func DFS(g Graph, start Node, visitor func(Node)) {
visited := make(map[Node]bool)
var dfs func(Node)
dfs = func(n Node) {
if visited[n] {
return
}
visitor(n)
visited[n] = true
for _, n2 := range g.Neighbors(n) {
dfs(n2)
}
}
dfs(start)
}
This seems simple enough, but it has a lot of problems. For example, we loose
type-safety: Even if we write Neighbors(Node) []Node
there is no way to tell
the compiler, that these instances of Node
will actually always be the same.
So an implementation of the graph interface would have to do type-assertions
all over the place. Another problem is:
type AdjacencyList [][]int
func (l AdjacencyList) Neighbors(n Node) []Node {
i := n.(int)
var result []Node
for _, j := range l[i] {
result = append(result, j)
}
return result
}
An implementation of this interface as an adjacency-list actually performs
pretty badly, because it can not return an []int
, but must return a []Node
,
and even though int
satisfies Node
, []int
is not assignable to []Node
(for good reasons that lie in the implementation of interfaces, but still).
The way to solve this, is to always map your nodes to integers. This is what
the standard library does in the
sort-package. It is exactly the same
problem. But it might not always be possible, let alone straightforward, to do
this for Graphs, for example if they do not fit into memory (e.g. a
web-crawler). The answer is to have the caller maintain this mapping via a
map[Node]int
or something similar, but… meh.
Weak references
I have to admit, that I am not sure, my use case here is really an important or
even very nice one, but let’s assume I want to have a database abstraction that
transparently handles pointer-indirection. So let’s say I have two tables T1
and T2 and T2 has a foreign key referencing T1. I think it would be pretty
neat, if a database abstraction could automatically deserialize this into a
pointer to a T1-value A
. But to do this. we would a) need to be able to
recognize A
a later Put (so if the user changes A
and later stores it, the
database knows what row in T1 to update) and b) hand out the same pointer, if
another row in T2 references the same id.
The only way I can think how to do this is to maintain a map[Id]*T1
(or
similar), but this would prevent the handed out values to ever be
garbage-collected. Even though there a
hacks
that would allow some use cases for weak references to be emulated, I don’t see
how they would work here.
So, as in the case of generics, this mainly means that some elegant APIs are not possible in go for library authors (and as I said, in this specific case it probably isn’t a very good idea. For example you would have to think about what happens, if the user gets the same value in two different goroutines from the database).
Dynamic loading of go code
It would be useful to be able to dynamically load go code at runtime, to build
plugins for go software. Specifically I want a good go replacement for
jekyll because I went through some ruby-version-hell
with it lately (for example jekyll serve -w
still does not work for me with
the version packaged in debian) and I think a statically linked go-binary would
take a lot of possible pain-points out here. But plugins are a really important
feature of jekyll for me, so I still want to be able to customize a page with
plugins (how to avoid introducing the same version hell with this is another
topic).
The currently recommended ways to do plugins are a) as go-packages and recompiling the whole binary for every change of a plugin and b) using sub-processes and net/rpc.
I don’t feel a) being a good fit here, because it means maintaining a separate binary for every jekyll-site you have which just sounds like a smallish nightmare for binary distributions (plus I have use cases for plugins where even the relatively small compilation times of go would result in an intolerable increase in startup-time).
b) on the other hand results in a lot of runtime-penalty: For example I can not
really pass interfaces between plugins, let alone use channels or something and
every function call has to have its parameters and results serialized and
deserialized. Where in the same process I can just define a transformation
between different formats as a func(r io.Reader) io.Reader
or something, in
the RPC-context I first have to transmit the entire file over a socket, or have
the plugin-author implement a net/rpc
server himself and somehow pass a
reference to it over the wire. This increases the burden on the plugin-authors
too much, I think.
Luckily, it seems there seems to be some thought put forward recently on how to implement this, so maybe we see this in the nearish future.
Garbage-collected goroutines
Now, this is the only thing I really don’t understand why it is not part of the language. Concurrency in go is a first-class citizen and garbage-collection is a feature emphasized all the time by the go-authors as an advantage. Yet, they both seem to not play entirely well together, making concurrency worse than it has to be.
Something like the standard example of how goroutines and channels work goes a little bit like this:
func Foo() {
ch := make(chan int)
go func() {
i := 0
for {
ch <- i
i++
}
}()
for {
fmt.Println(<-ch)
}
}
Now, this is all well, but what if we want to exit the loop prematurely? We have to do something like this:
func Foo() {
ch := make(chan int)
done := make(chan bool)
go func() {
i := 0
for {
select {
case ch <- i:
i++
case <-done:
return
}
}
}()
for {
i := <-ch
if i > 1000 {
break
}
fmt.Println(i)
}
}
Because otherwise the goroutine would just stay around for all eternity,
effectively being leaked memory. There are
entire talks
build around this and similar problems, where I don’t really understand why. If
we add a break
to our first version, Foo
returns and suddenly, all other
references to ch
, except the one the goroutine is blocking on writing to are
gone and can be garbage-collected. The runtime can already detect if all
goroutines are sleeping and we have a deadlock, the garbage-collector can
accurately see what references there are to a given channel, why can we not
combine the two to just see “there is absolutely no way, this channel-write
can ever succeed, so let’s just kill it and gc all it’s memory”? This would
have zero impact on existing programs (because as you can not get any
references to goroutines, a deadlocked one can have no side-effect on the rest
of the program), but it would make channels so much more fun to work with. It
would make channels as iterators a truly elegant pattern, it would simplify
pipelines and it would possibly allow a
myriad other use cases for channels I can not think of right now. Heck, you
could even think about (not sure if this is possible or desirable) running any
deferred statements, when a goroutine is garbage-collected, so all other
resources held by it will be correctly released.
This is the one thing I really wish to be added to the language. Really diving into channels and concurrency right now is very much spoiled for me because I always have to think about draining every channel, always think about what goroutine closes what channels, passing cancellation-channels…