Surprise at Golang Thread Scheduling

 

by Philip Winder

A few days ago I had a problem with high CPU usage in one of my Go-based microservices. The microservice has evolved into two distinct components: a HTTP web-app and a batch processing service. At some point, we’ll probably split these out. But in its current guise, we were seeing HTTP request latencies of greater than ten seconds. It turns out that the reason is due to the Go scheduler not scheduling the HTTP receiver thread. Read on to find out why.

First a quick recap. Go doesn’t directly use OS threads. It has a concept built into the syntax of the language called a “go routine”, which seems a lot like a thread. Go routines are then scheduled by the Go scheduler which has a fixed pool of OS threads to utilise. Depending on how many cores your CPU has, these OS threads will then be spread across the CPU. (I’m intentionally ignoring hyperthreading)

Let’s run a simple experiment to test this out. The code below has two go routines. One is an “intensive” task and the other is a print-line. We also force the runtime to only use a single thread. You can play with this code here.

If you run this code, you will see that it results in:

Done intensive thing
print x = 10000000.

Note how the “CPU intensive thing” ran and finished first. This is because the go scheduler needs to be called in order to perform a scheduling event. If we add the code runtime.Gosched() at line 10 in the for loop and run the code again, we get the following output:

print x = 1.
Done intensive thing

This time, we have allowed the scheduler to reschedule the tasks (and provide the print-line go routine with some more CPU time).

Note that the sleep for 1 nanosecond is used purely for the internal Gosched call. The main routine does not finish before all the other code has had time to run (try it yourself via the link above).

Apparently, the Go codebase is littered with Gosched calls, so whenever you call time.Sleep or fmt.Printf, Gosched will be called for you. Let’s try a more realistic example, decoding some JSON.

If you run this code, you will see the same issue. This is because the JSON Decode method DOES NOT have a Gosched inside its code.

Compared to other languages that use OS threads, this is surprising. I didn’t expect to have to call the scheduler myself and assumed there was some internal process that would manage it for me. Not so.

The interesting thing is that even if you force the GOMAXPROCS to something more than 1, it still occurs. I think this is because the playground is only allowing one equivalent CPU. And the scheduling is on a per-CPU basis.

In summary, for each CPU, Golang is always going to act like it’s running on a single thread. That is unless you call Gosched yourself, or call something that does or the Go routine ends. It doesn’t matter how many threads you specify or Goroutines you start, it will still do one CPU intensive thing per-CPU at a time.

Edit:

I received a few comments from readers with more information. Thank you to Cale Hoopes, Sam Whited, Dαve Cheney‏.

GOMAXPROCS is the environmental variable that specifies the number of OS threads that Go can utilise. This is independent of the talk about calling Gosched which in some high-performance situations you will need to insert into your code. Note that the vast majority of the time, the internal calls to Gosched will be adequate.

More information:

https://golang.org/pkg/runtime/
https://github.com/golang/go/blob/master/src/runtime/proc.go
https://golang.org/s/go11sched
https://github.com/golang/go/issues/11462

The following two tabs change content below.

Philip Winder

Phil Winder is a multi-disciplinary consultant architect that specialises in the Research and Development of cutting edge technology. His expertise lies between the boundaries of Software Development and Machine Learning. Describing himself simply as an “Engineer”, he has 10 years experience in a wide range of Engineering disciplines (7 years software and machine learning, 3 years production electronics). Phil has Ph.D. and Masters degrees from the University of Hull, U.K. These were in Electronics, with a focus on embedded signal processing.

Latest posts by Philip Winder (see all)

6 Comments

  1. “Compared to other languages that use OS threads, this is surprising. I didn’t expect to have to call the scheduler myself and assumed there was some internal process that would manage it for me. ”
    – A goroutine isn’t an OS thread, it is much, much, much lighter. Part of that lightness and speed is precisely because it is cooperative multitasked instead of preemptively so.

    • Agreed. I’m thinking top-level here, comparing it to traditional languages with an accepted idea of what a thread is.

    • Yes, this is the same issue. “well-known” being the operative word. It’s well known to everyone that knows about it. 🙂

  2. > Note that the sleep for 1 nanosecond is used purely for the internal Gosched call. The main routine does not finish before all the other code has had time to run (try it yourself via the link above).

    Doesn’t seem to be guaranteed. At my macbook golang 1.8 the main routine doesn’t normally wait for the subroutines if the delay is less than couple of microseconds. Replacing it with runtime.Gosched() does the trick.

  3. Go’s scheduler is preemptive.

    … but it based on syscalls … but it not based on time (as an Os thread scheduler).

    That said go processor can block a goroutine if it is doing a syscall.

    This will print 100000 everytime:
    func DoNothing() {}

    func cpuIntensive(p *int64) {
    for i := int64(1); i <= 100000; i++ {
    DoNothing() // << Function call .. Not a preemption point
    *p = i
    }
    fmt.Println("\nDone intensive thing\n")
    }

    This will print a random number from 1 to 100000:
    func cpuIntensive(p *int64) {
    for i := int64(1); i <= 100000; i++ {
    fmt.Printf("") // << Possible Preemtion point.
    *p = i
    }
    fmt.Println("\nDone intensive thing\n")
    }

    Sidenote: As I expired (in this example), It is required to have the conditional number greater than 100000 otherwise the scheduler will let the for cycle finish without preemption.

Leave a Reply

Your email address will not be published. Required fields are marked *