La Vita è Bella

Saturday, February 11, 2017

Some Go memory notes

I recently did some work on a backend server written in Go, and one of the work is to reduce the memory usage, as the old code will use ~110GB of memory steadily (this is not memory leak), mainly because of we used to strive for the speed of coding, not efficiency. With the help of Go's standard pprof tool (which is wonderful), we managed to lowered memory usage down to less than 15GB steadily. Here are some notes I learnt from the process:

You don't always want to read io.Reader into []byte

When you are passing some data around, they usually come in one of the two forms in Go: either an io.Reader (or sometimes io.ReadCloser, which is kind of compatible with io.Reader), or []byte.

[]byte is something more permanent: It just sits there, you can do whatever you want with it. io.Reader, on the other hand, is more fragile: You can only read it once, and it's gone after that (unless you saved what you just read).

Because of that, sometimes it's very tempting to read that io.Reader you just got from some function into []byte (using ioutil.ReadAll) to save it permanently.

But that comes with a price: you need memory to save that []byte you just read. Do you really need that?

Most of the functions you need to pass the data on will gladly accept io.Reader instead of []byte as the parameter, and there's a good reason for that: It's always trivial to create an io.Reader out of []byte (both bytes.NewReader and bytes.NewBuffer can achieve that goal, I did some benchmark test and there're no significant performance difference between them, but I still slightly prefer bytes.NewReader, because Occam's razor). If you only need to pass the data to a single function to process once, there's no need to read it into []byte and convert it back into io.Reader again.

Sometimes even if you need to process the data in two different functions, you still don't need to call ioutil.ReadAll. One common use case is that you have a content of a file in the form of io.Reader, and you want to get the content type first using http.DetectContentType before sending that file to your user. You don't need the whole content of the file for that. http.DetectContentType only needs the first 512 bytes. So you could just wrap your reader with bufio.Reader, then Peek 512 bytes for http.DetectContentType, after that you can just send your bufio.Reader to the function to your user.

Beware of []byte in JSON

In Go's JSON package, []byte is "encoded as a base64-encoded string". This is a very reasonable design choice, but if you are using JSON to pass large chunk of []byte between machines, you might want to think twice.

On sender's side, you first need to save the whole chunk of []byte into memory, then when doing the JSON encoding, you will need to save both the original chunk of []byte, and the whole encoded base64 string, which is larger than your original chunk by design. Congratulations, you just more than doubled your memory consumption.

It's the same on receiver's side. At first you need to save the whole JSON string, including the very large base64 string, into memory (because Go's JSON decoding is not really a streaming friendly operation, although Go's base64 decoding is streaming friendly, that doesn't help in this case). During the decoding, it will allocate another chunk of memory to save your original []byte. So in the end you'll spend the same or more memory as the sender side.

If you really need to pass big chunk []byte between machines, JSON is not your friend. You should use some binary and streaming protocol instead. Both gRPC and thrift will do that job nicely, and they both have good Go support.

Avoid "adding" strings directly

Imagine you have a function to join strings together like this (No you don't really want to implement this. strings.Join already implemented this feature for you. This is just a simple example):

func JoinStrings(parts []stringstring {
  var s string
  for _, part := range parts {
    s += part
  }
  return s
}

Every time you do s +=, Go will actually allocate a totally new, bigger string, copy previous s over, then append new part at the end. This wastes both CPU (for copying) and memory.

Instead, you can use bytes.Buffer, which works like StringBuilder in Java:

func JoinStrings(parts []stringstring {
  var buf bytes.Buffer
  for _, part := range parts {
    buf.WriteString(part)
  }
  return buf.String()
}



tags: , , ,

02:52:29 by fishy - dev - Permanent Link

no comments yet - no trackbacks yet - karma: 1 [+/-]

Thursday, November 24, 2016

OneKey

This is an open source project I started 5 years ago, and just "finished" recently.

Back in 2011, my friend tension7 released a Chrome extension called One Password. It has a simple idea: using HMAC-MD5 to generate unique passwords for different sites.

I thought that was a good idea, and started to use it myself. (The idea is actually called "deterministic password manager/generator", and there are some discussions about it recently. I agree that it doesn't make your password stronger, but it do works better in certain cases. I use it on some sites myself, but not all of them.) I wrote a python version of it so that when I can use it easily while I'm not on Chrome.

And I also started writing an Android app for it. This is the Android app.

The sad thing is just that there're too many distractions in life, so I never found time to finish it. I made it "usable" at some point of 2013 or 2014, put it on my phone, and just put it away.

But I suddenly had some time due to various reasons recently, so I finally finished it. By "finish it" I mean I did a 1.0 release and listed it on Google Play, 5 years after I started the project.

There are still some TODOs, for example I would like to add the feature to save the master password with fingerprint, but I didn't figure out a good way to design the UI to make it work. But it's mostly feature complete, with highlights:

On the technical side, I chose Buck as the building system for this project, because it was the build system used by my day job project, and it worked great. I also tried to move it from Buck to Bazel, but Android's incompatible with Java8 is causing a lot of problems. I will probably switch to Bazel after that is sorted out.

Anyway, if you found the idea of deterministic password generator appealing to you (at least in certain cases), please give it a try. I hope you enjoy it.



tags: , , ,

21:54:18 by fishy - opensource - Permanent Link

no comments yet - no trackbacks yet - karma: -3 [+/-]

Friday, December 11, 2015

Let's Encrypt!

Thanks to Let's Encrypt, this blog is now serving via https (and https only):

Screenshot of https certificate

In the process of enabling https, I also switched my host from Dreamhost to Google Cloud, and switched to nginx as httpd. (And Dreamhost announced Let's Encrypt support after I made the switch)

The only problem with Let's Encrypt is that the certificate is only valid for 90 days (ok no support of wildcard domain might also be a problem, but I don't feel it), which means I need to renew my certificates often. Luckily that can be done via a monthly (or bi-monthly) cron job.

This is the code snippet of my nginx configuration to make both https only and Let's Encrypt ACME verification work:

server {
        listen 80;
        listen [::]:80;

        server_name    yuxuan.org www.yuxuan.org wang.yuxuan.org;

        location /.well-known/acme-challenge/ {
                alias /var/www/challenges/.well-known/acme-challenge/;
                try_files $uri =404;
        }

        location / {
                return 301 https://$host$request_uri;
        }
}

And this is the script to be put into crontab (I use the official client from Debian experimental):

/usr/bin/letsencrypt certonly --renew-by-default --webroot -w /var/www/challenges -d yuxuan.org -d www.yuxuan.org -d wang.yuxuan.org

That's it! Please consider donating to Let's Encrypt!



tags: , , , ,

23:26:36 by fishy - linux - Permanent Link

no comments yet - no trackbacks yet - karma: 10 [+/-]

Thursday, November 10, 2011

I tried to migrate from Aperture to Lightroom but rolled back

UPDATE: with Aperture 4 Beta released with geotag support, I finally migrated to Lightroom (4), and will buy it on Day 1!

I bought my first "RAW camera" at the end of 2009, and found that iPhoto is not enough for me. So I purchased Apple Aperture 3 at February 2010, almost immediately after it's released. One reason is that I was used to iPhoto and Aperture carried on most of the things, another reason is that Adobe Photoshop Lightroom 3 is on its way but still about half year away at that time.

I'm just an amateur photographer, snap some photos for my life record and show them on Flickr. I don't sell my photos for life (but will be glad if you're interested :P). So I don't need something extremely professional to process my photos. Generally I just use the "Auto Enhance" in Aperture to tune my photos, and custom white balance sometimes. I rarely use other tunes in Aperture. So generally I'm happy with Aperture.

But there're still something drive me crazy, like the slowness, and some bugs. I was upset about some of the bugs, so serious considered to migrate to Lightroom. I downloaded Lightroom trial and borrowed a USB HD to move my photos (~80GB) from Aperture library into Lightroom, and seriously used it for a couple of days.

But, there's always a "but", Lightroom is not perfect and there're still something I don't like. And I found out that the "bugs" drive me crazy is actually something wrong with my Mac, not Aperture. So I finally decided to cool down and not to invest another $150 on Lightroom, and keep on Aperture, at least for now.

Here're some of my comparisons.

What Lightroom did great?

First I'd like to talk about some of Lightroom's advantages, and of course, they are Aperture's disadvantages:

What Aperture did great?

There're still something Aperture did better than Lightroom. Most of them are just useless to professionals, but they are good for amateurs:

Between speed and geotag, I chose geotag. But I do hope Apple can match Lightroom's speed, or Adobe can add native geotag support to Lightroom. I just hate Aperture's slowness, and Lightroom's lack of geotag support. For now, I can only suffer the slow.



tags: , , ,

15:01:42 by fishy - mac - Permanent Link

no comments yet - no trackbacks yet - karma: 146 [+/-]

Saturday, January 22, 2011

Fix epubs

The wonderful epub reader for Android, Aldiko, released a major update recently. Besides some great new features/improvements, it also broke something: all Chinese characters in Pro Git Chinese version can't be displayed. But it didn't simply broke Chinese epubs at all. My other Chinese epubs are just fine.

I contacted Aldiko for help, they just replied claiming that I need to embed Chinese font into epub for them to display, and Adobe Digital Edition have the same problem. I tried my epubs with Adobe Digital Edition and it's really the same. But after I looked into my good and broken epubs (they are simply zip archives with metadata, html, css and pictures inside), I can't find embedded Chinese font (in the good ones, of course) at all.

Instead, I found something else interesting: all my good epubs have an "xml:lang" parameter set to "zh-CN" in every html tag, and the broken one didn't.

So I add that parameter to every html file in progit-zh.epub, and repack the file, now it works (in both Aldiko and Adobe Digital Edition)!

To simplify this work, I wrote a python script to do that automatically. It will unpack the epub file into a temp directory, add "xml:lang" parameter to every html file if didn't present, and then pack the epub back again. It will also output the replaced lines into stderr, like:

./progit_split_000.html:
<html xmlns="http://www.w3.org/1999/xhtml">
->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh-CN">

Get it from my script repo on GitHub, and change the LANG constant if needed.

If you have other ideas about it, please let me know.



tags: , , ,

21:28:08 by fishy - i18n.zh - Permanent Link

no comments yet - no trackbacks yet - karma: 23 [+/-]

Monday, January 10, 2011

Purge Strikes Back to Android!

A Little Bit History

Back in 2005, after graduation from college, I got my first job at TrendMicro. After I got paid for enough money, I went to the store and brought a Palm Treo 650 home. That's the dream phone of that time. It have good hardware spec and more important, PalmOS.

My Treo 650 was broken at 2007. The radio module is died. So I can't use it as a phone, but can still use it as a PDA. I've made a nature choice: bought a Treo 680 to replace it. The reason is quite simple: it have PalmOS inside.

But the bad thing is I lost my Treo 680 at the end of 2007, left it on a table of a restaurant and never got it back again. This time I didn't buy Centro (but I recommended it to my then roommate :P). I decided to try something new. So I bought a Symbian phone (Nokia E51), Symbian looks like something comes from Stone Age compare to Palm OS. Then I bought a Windows Mobile phone (Sony Ericsson Xperia X1), but realized later that I do hate Windows Mobile. I bought a Samsung Galaxy (not to be confused with Samsung Galaxy S) at 2009, as I thought Android was mature enough to use. And I do love Android! But the lack of support and upgrade from Samsung makes me unhappy.

Finally, Google released Nexus One and I ordered one at the first time. This becomes the first phone after Treo 680 that I'm still happy after using it for half year, and I'm still happy with it while it's nearly a whole year already for now.

But...

But there's still something I missed from Palm, like "Purge":

Purge

(I dug out my broken Treo 650 today, blow away the dusts, insert the battery, and it still works! so comes this photo.)

"Purge" is a simple feature to do the cleanup work: delete all old messages older than N days. Yes Android is much more powerful compare to Palm OS in the old days, and it can keep "almost everything" (actually for SMS, it will keep only the recent 200 messages per conversation and automatically delete old ones by default). But I don't like to keep too many meaningless "everything", I prefer the old Palm way.

So I took the last weekend to make Purge Strikes Back!

Purge Strikes Back!

YOU HAVE BEEN WARNED!

Something I must mention is that for SMS deletion, I used some undocumented API. What did this mean? It means that this app may or may NOT work for you. It may or may NOT work as you expected. It may DELETE ALL YOUR SMS even you didn't mean to. It works on my Nexus One with Froyo and system Message app, it may not work with other phones, Android versions or Message apps. And it may break anytime even it works fine for you now. You take it totally on your own risk.

The call log deletion is implemented with system API, which should works fine, but I took no responsibility for your unexpected data loss, either.

So you'd better BACKUP YOUR SMS BEFORE TRYING IT. Personally I'm using SMS Backup + to do this for me and it works fine (it also backups call logs, too). There may be better ones but this is a choice you can consider.

Again, YOU HAVE BEEN WARNED!

How to Get It

Download it from its GitHub repo. It's free. You can also get the source code there, licensed under GPL v3.

I'll publish it to the Android Market in the coming days.

What to Expect in the Future

Acknowledgement

Some UI elements took from another awesome open source Android app Financisto, which is licensed under GPL v2.

Link

project homepage



tags: , , ,

23:53:07 by fishy - opensource - Permanent Link

no comments yet - no trackbacks yet - karma: 35 [+/-]

Wednesday, October 13, 2010

Say "Goodbye and Thank you" to POPFile

POPFile is an awesome mail classifier works as a local POP3 proxy. It's written in Perl and use a web UI so it can run on any platform.

I started using it back from about 2004. First I use it to filter spams, later to filter work mail, personal mail and important mails apart. Time moves on, as I stop using POP3 on Gmail, the main job for POPFile became to picking not-so-important script generated reports from other work mails (combined with Thunderbird filter based on "X-Text-Classification" header).

As I also use Gmail for work mail now at Google, I have only a few POP3 accounts that rarely get some non-spam mails. So it's finally the time to say goodbye to POPFile. As a record, here's the final statistics for my POPFile history, since Jul. 2007, the last time I reset the stats:

Messages classified: 135,631,
Classification errors: 910,
Accuracy: 99.32%

That's very impressive result. Thank you POPFile for so many years of service. If you are still using POP3 account, you should try it right now!

BTW, I always hope Gmail can auto label my mails, just like POPFile. Finally we got priority inbox, but that's not enough for me.



tags: , , ,

22:54:43 by fishy - opensource - Permanent Link

no comments yet - no trackbacks yet - karma: 30 [+/-]

Tuesday, March 02, 2010

More tips about my projtags.vim

I wrote a vim plugin named projtags, initially for loading tags file for projects. But as I can "set tags+=tags;" in my vimrc, it's not that useful on the tags file way. I expanded its feature to support commands for per projects, and this is much more useful :P Here are some examples:

Case 1: different coding styles

I did some squid hacking before. My coding style is as below:

set cindent
set autoindent
set smartindent
set tabstop=8
set softtabstop=8
set shiftwidth=8
set noexpandtab
set smarttab
set cino=:0,g0,t0

But squid coding style is different. They use 4 for shiftwidth. As hacking some project, you should always follow the original coding style. But add vim modeline into every file is also unacceptable. So I use projtags.vim to do this job:

let g:ProjTags += [["~/work/squid", ":set sw=4"]]

So that every time I'm editing a file within squid, vim will use 4 instead of 8 for shiftwidth, now I'm happy with both my own codes and squid codes.

Case 2: use ant as makeprg for java projects

First, I use a autocmd to set ant as makeprg for java files:

autocmd BufNewFile,BufRead *.java :setlocal makeprg=ant\ -s\ build.xml

It works well for java files. But as I did some Android development these days, I also have lots of xml files to edit (and then make to see the result). I can't use such a autocmd for xml's, as not all xml's are in a java project. So projtags.vim is again the answer:

let g:ProjTags += [["~/work/pmats", ":set mp=ant\\ -s\\ build.xml"]]

So I can always use ":make" for ant under my Android project now, no matter it's .java or .xml.

My vimrc segment for tags and projtags

set tags+=/usr/local/include/tags
set tags+=/usr/include/tags
set tags+=/opt/local/include/tags
set tags+=tags;

let g:ProjTags = []
let g:ProjTags += [["~/work/squid", ":set sw=4"]]
let g:ProjTags += [["~/work/pmats", ":set mp=ant\\ -s\\ build.xml"]]

Patches for projtags.vim (e.g. Windows support, I haven't tested it but I guess it won't work under Windows now) are welcomed :D You can get the code from the git repo for my various scripts.



tags: , , ,

19:58:01 by fishy - dev - Permanent Link

6 comments - no trackbacks yet - karma: 30 [+/-]

Sunday, October 11, 2009

Modified Pwitter to support 3rd party twitter API

Pwitter is a good open source Twitter client for Mac. I just added supporting for 3rd party twitter API to it to make it better.

You can get my patch here or my build here.

There's no GUI for setting 3rd party twitter API address. Instead, you need to change the configuration file. You can use either Property List Editor or the Terminal "defaults write" way. Here's an example:

$ defaults write com.aki.Pwitter twitter_https -bool NO
$ defaults write com.aki.Pwitter twitter_base "teewt.yhsif.com"

To restore the default settings ( https and "twitter.com" ), you can just delete the custom settings by:

$ defaults delete com.aki.Pwitter twitter_https
$ defaults delete com.aki.Pwitter twitter_base

It's useful when you can't connect to twitter.com directly, like me in China. Enjoy it.



tags: , , , ,

22:06:13 by fishy - opensource - Permanent Link

no comments yet - no trackbacks yet - karma: 54 [+/-]

Tuesday, April 28, 2009

A script for Columbus V-900 GPS

Columbus V-900 is so far the GPS that best fit my requirements: it can log, it can take waypoints while logging, and it (can) have a big storage for logging (via TF card). I got one from WC, a friend to give it a try.

It log GPS tracks to CSV format, WC wrote a script to convert it to the GPX format, but without waypoints. So I rewrote a Python script (as I'm not so familiar with perl), to add the waypoints to the GPX file.

Get the script here, it will use the voice record filename as the name of the waypoint if available, or otherwise just "Waypoint #N". You may want to edit the converted GPX file to rename the waypoints.



tags: , , , , ,

11:47:57 by fishy - dev - Permanent Link

no comments yet - 1 trackback - karma: 60 [+/-]

May the Force be with you. RAmen