Some Go memory notes

I recently did some work on a backend server written in Go, and one of the work is to reduce the memory usage, as the old code will use ~110GB of memory steadily (this is not memory leak), mainly because of we used to strive for the speed of coding, not efficiency. With the help of Go's standard pprof tool (which is wonderful), we managed to lowered memory usage down to less than 15GB steadily. Here are some notes I learnt from the process:

You don't always want to read io.Reader into []byte

When you are passing some data around, they usually come in one of the two forms in Go: either an io.Reader (or sometimes io.ReadCloser, which is kind of compatible with io.Reader), or []byte.

[]byte is something more permanent: It just sits there, you can do whatever you want with it. io.Reader, on the other hand, is more fragile: You can only read it once, and it's gone after that (unless you saved what you just read).

Because of that, sometimes it's very tempting to read that io.Reader you just got from some function into []byte (using ioutil.ReadAll) to save it permanently.

But that comes with a price: you need memory to save that []byte you just read. Do you really need that?

Most of the functions you need to pass the data on will gladly accept io.Reader instead of []byte as the parameter, and there's a good reason for that: It's always trivial to create an io.Reader out of []byte (both bytes.NewReader and bytes.NewBuffer can achieve that goal, I did some benchmark test and there're no significant performance difference between them, but I still slightly prefer bytes.NewReader, because Occam's razor). If you only need to pass the data to a single function to process once, there's no need to read it into []byte and convert it back into io.Reader again.

Sometimes even if you need to process the data in two different functions, you still don't need to call ioutil.ReadAll. One common use case is that you have a content of a file in the form of io.Reader, and you want to get the content type first using http.DetectContentType before sending that file to your user. You don't need the whole content of the file for that. http.DetectContentType only needs the first 512 bytes. So you could just wrap your reader with bufio.Reader, then Peek 512 bytes for http.DetectContentType, after that you can just send your bufio.Reader to the function to your user.

Beware of []byte in JSON

In Go's JSON package, []byte is "encoded as a base64-encoded string". This is a very reasonable design choice, but if you are using JSON to pass large chunk of []byte between machines, you might want to think twice.

On sender's side, you first need to save the whole chunk of []byte into memory, then when doing the JSON encoding, you will need to save both the original chunk of []byte, and the whole encoded base64 string, which is larger than your original chunk by design. Congratulations, you just more than doubled your memory consumption.

It's the same on receiver's side. At first you need to save the whole JSON string, including the very large base64 string, into memory (because Go's JSON decoding is not really a streaming friendly operation, although Go's base64 decoding is streaming friendly, that doesn't help in this case). During the decoding, it will allocate another chunk of memory to save your original []byte. So in the end you'll spend the same or more memory as the sender side.

If you really need to pass big chunk []byte between machines, JSON is not your friend. You should use some binary and streaming protocol instead. Both gRPC and thrift will do that job nicely, and they both have good Go support.

Avoid "adding" strings directly

Imagine you have a function to join strings together like this (No you don't really want to implement this. strings.Join already implemented this feature for you. This is just a simple example):

func JoinStrings(parts []string) string {
  var s string
  for _, part := range parts {
    s += part
  }
  return s
}

Every time you do s +=, Go will actually allocate a totally new, bigger string, copy previous s over, then append new part at the end. This wastes both CPU (for copying) and memory.

Instead, you can use bytes.Buffer, which works like StringBuilder in Java:

func JoinStrings(parts []string) string {
  var buf bytes.Buffer
  for _, part := range parts {
    buf.WriteString(part)
  }
  return buf.String()
}

La Vita è Bella

2017-02-11