Concurrency and Parallelism in Go

This is incomplete!!

A tutorial in concurrency and parallelism with the Go programming language.

When most people use the word “concurrent,” they’re usually referring to a process that occurs simultaneously with one or more processes. It is also usually implied that all of these processes are making progress at about the same time. Under this definition, an easy way to think about this is people. You are currently reading this sentence while others in the world are simultaneously living their lives. They are existing concurrently to you.

Part of a series on The Go Programming Language

Why?

As with most paths toward understanding, we’ll begin with a bit of history. Let’s first take a look at how concurrency became such an important topic.

Moore’s Law, Web, And the Current Mess

In 1965, Gordon Moore wrote a three-page paper that described both the consolidation of the electronics market toward integrated circuits and the doubling of the number of components in an integrated circuit every year for at least a decade. In 1975, he revised this prediction to state that the number of components on an integrated circuit would double every two years. This prediction more or less held until just recently—around 2012.

Several companies foresaw this slowdown in the rate Moore’s law predicted and began to investigate alternative ways to increase computing power. As the saying goes, necessity is the mother of innovation, and so it was in this way that multicore processors were born.

This looked like a clever way to solve the bounding problems of Moore’s law, but computer scientists soon found themselves facing down the limits of another law: Amdahl’s law, named after computer architect Gene Amdahl. Amdahl’s law describes a way in which to model the potential performance gains from implementing the solution to a problem in a parallel manner. Simply put, it states that the gains are bounded by how much of the program must be written sequentially.

For example, imagine you were writing a program that was largely GUI based: a user is presented with an interface, clicks on some buttons, and stuff happens. This type of program is bounded by one very large sequential portion of the pipeline: human interaction. No matter how many cores you make available to this program, it will always be bounded by how quickly the user can interact with the interface. Now consider a different example, calculating the digits of pi. Thanks to a class of algorithms called spigot algorithms, this problem is called embarrassingly parallel, which —despite sounding made up—is a technical term that means that it can easily be divided into parallel tasks. In this case, significant gains can be made by making more cores available to your program, and your new problem becomes how to combine and store the results.

For problems that are embarrassingly parallel, it is recommended that you write your application so that it can scale horizontally. This means that you can take instances of your program, run it on more CPUs, or machines, and this will cause the runtime of the system to improve. Embarrassingly parallel problems fit this model so well because it’s very easy to structure your program in such a way that you can send chunks of a problem to different instances of your application.

Problems

Concurrent code is notoriously difficult to get right. It usually takes a few iterations to get it working as expected, and even then it’s not uncommon for bugs to exist in code for years before some change in timing (heavier disk utilization, more users logged into the system, etc.) causes a previously undiscovered bug to rear its head. Indeed, for this very book, I’ve gotten as many eyes as possible on the code to try and mitigate this.

Race Conditions

A race condition occurs when two or more operations must execute in the correct order, but the program has not been written so that this order is guaranteed to be maintained.

Most of the time, this shows up in what’s called a data race, where one concurrent operation attempts to read a variable while at some undetermined time another concurrent operation is attempting to write to the same variable.

Here’s a basic example:

var data int
go func() {
	data++
}()
if data == 0 {
	fmt.Printf("the value is %v.\n", data)
}

Atomicity

When something is considered atomic, or has the property of atomicity, this means that within the context that it is operating, it is indivisible, or uninterruptible.

So what does that mean, and why is this important to know when working with concurrent code?

The first and very important thing is the word “context.” Something may be atomic in one context, but not another. Operations that are atomic within the context of your process may not be atomic in the context of the operating system; operations that are atomic within the context of the operating system may not be atomic within the context of your machine, and operations that are atomic within the context of your machine may not be atomic within the context of your application. In other words, the atomicity of an operation can change depending on the currently defined scope. This fact can work both for and against you.

When thinking about atomicity, very often the first thing you need to do is to define the context or scope, and the operation will be considered to be atomic. Everything follows from this.

Now let’s look at the terms “indivisible” and “uninterruptible.” These terms mean that within the context you’ve defined, something that is atomic will happen in its entirety without anything happening in that context simultaneously. That’s still a mouthful, so let’s look at an example:

i++

This is about as simple an example as anyone can contrive, and yet it easily demonstrates the concept of atomicity. It may look atomic, but a brief analysis reveals several operations:

Retrieve the value of i.
Increment the value o.
Store the value of i.

While each of these operations alone is atomic, the combination of the three may not be, depending on your context. This reveals an interesting property of atomic operations: combining them does not necessarily produce a larger atomic operation. Making the operation atomic is dependent on which context you’d like it to be atomic within. If your context is a program with no concurrent processes, then this code is atomic within that context. If your context is a goroutine that doesn’t expose I to other goroutines, then this code is atomic.

Deadlocks

A deadlocked program is one in which all concurrent processes are waiting on one another. In this state, the program will never recover without outside intervention. If that sounds grim, it’s because it is! The Go runtime attempts to do its part and will detect some deadlocks (all goroutines must be blocked, or “asleep”), but this doesn’t do much to help you prevent deadlocks.

To help solidify what a deadlock is, let’s first look at an example. There is an accepted proposal to allow the runtime to detect partial deadlocks, but it has not been implemented. For more information, see https://github.com/golang/go/issues/13759.

var wg sync.WaitGroup
printSum := func(v1, v2 *value) {
    defer wg.Done()
    v1.mu.Lock() // Here we attempt to enter the critical section for the incoming value.
    defer v1.mu.Unlock() // Here we use the defer statement to exit the critical section before printSum returns.
    time.Sleep(2*time.Second) // Here we sleep for a period of time to simulate wor
    v2.mu.Lock()
    defer v2.mu.Unlock()
    fmt.Printf("sum=%v\n", v1.value + v2.value)
}
var a, b value
wg.Add(2)
go printSum(&a, &b)
go printSum(&b, &a)
wg.Wait()

If you were to try and run this code, you’d probably see:

fatal error: all goroutines are asleep - deadlock!

Why? If you look carefully, you’ll see a timing issue in this code. Following is a graph‐ local representation of what’s going on. The boxes represent functions, the horizontal lines calls to these functions and the vertical bars lifetimes of the function at the head of the graphic

It seems pretty obvious why this deadlock is occurring when we lay it out graphically like that, but we would benefit from a more rigorous definition. It turns out there are a few conditions that must be present for deadlocks to arise, and in 1971, Edgar Coff‐ man enumerated these conditions in a paper. The conditions are now known as the Coffman Conditions are the basis for techniques that help detect, prevent, and correct deadlocks.

Mutual Exclusion A concurrent process holds exclusive rights to a resource at any one time.
Wait For the Condition A concurrent process must simultaneously hold a resource and be waiting for an
additional resources. No Preemption
A resource held by a concurrent process can only be released by that process, so it fulfills this condition.
Circular Wait A concurrent process (P1) must be waiting on a chain of other concurrent pro‐ cesses (P2), which are in turn waiting on it (P1), so it fulfills this final condition too.

KILL Livelock

Livelocks are programs that are actively performing concurrent operations, but these operations do nothing to move the state of the program forward.

Have you ever been in a hallway walking toward another person? She moves to one side to let you pass, but you’ve just done the same. So you move to the other side, but she’s also done the same. Imagine this going on forever, and you understand livelocks.

KILL Starvation

Concurrency vs Parallelism

The fact that concurrency is different from parallelism is often overlooked or misunderstood. In conversations between many developers, the two terms are often used interchangeably to mean “something that runs at the same time as something else.” Sometimes using the word “parallel” in this context is correct, but usually, if the developers are discussing code, they really ought to be using the word “concurrent.”

The reason to differentiate goes well beyond pedantry. The difference between concurrency and parallelism turns out to be a very powerful abstraction when modeling your code, and Go takes full advantage of this. Let’s take a look at how the two concepts are different so that we can understand the power of this abstraction. We’ll start with a very simple statement:

Concurrency is a property of the code; parallelism is a property of the running program.

Don’t we usually think about these two things the same way? We write our code so that it will execute in parallel. Right?

Well, let’s think about that for a second. If I write my code with the intent that two chunks of the program will run in parallel, do I have any guarantee that will happen when the program is run? What happens if I run the code on a machine with only one core? Some may be thinking, It will run in parallel, but this isn’t true!

The chunks of our program may appear to be running in parallel, but they’re executing sequentially faster than is distinguishable. The CPU context switches to share time between different programs, and over a coarse enough granularity of time, the tasks appear to be running in parallel. If we were to run the same binary on a machine with two cores, the program’s chunks might be running in parallel.

This reveals a few interesting and important things. The first is that we do not write parallel code, only concurrent code that we hope will be run in parallel. Once again, parallelism is a property of the runtime of our program, not the code.

The second interesting thing is that we see it is possible—maybe even desirable—to be ignorant of whether our concurrent code is running in parallel. This is only made possible by the layers of abstraction that lie beneath our program’s model: the concurrency primitives, the program’s runtime, the operating system, the platform the operating system runs on (in the case of hypervisors, containers, and virtual machines), and ultimately the CPUs. These abstractions are what allow us to make the distinction between concurrency and parallelism, and ultimately what gives us the power and flexibility to express ourselves. We’ll come back to this.

CPS

A CSP stands for “Communicating Sequential Processes,” which is both a technique and the name of the paper that introduced it. In 1978, Charles Antony Richard Hoare published a paper in the Association for Computing Machinery (more popularly referred to as ACM).

In this paper, Hoare suggests that input and output are two overlooked primitives of programming—particularly in concurrent code. At the time Hoare authored this paper, research was still being done on how to structure programs, but most of this effort was being directed to techniques for sequential code: usage of the goto statement was being debated (yeah funny history), and the object-oriented paradigm was beginning to take root. Concurrent operations weren’t being given much thought. Hoare set out to correct this, and thus his paper, and CSP, were born.

In the 1978 paper, CSP was only a simple programming language constructed solely to demonstrate the power of communicating sequential processes; in fact, he even says in the paper:

Thus the concepts and notations introduced in this paper should … not be regarded as suitable for use as a programming language, either for abstract or concrete programming.

Hoare was deeply concerned that the techniques he was presenting did nothing to further the study of the correctness of programs, and that the techniques may not be performant in a real language based on his own. Over the next six years, the idea of CSP was refined into a formal representation of something called process calculus to take the ideas of communicating sequential processes and beginning to reason about program correctness. Process calculus is a way to mathematically model concurrent systems and also provides algebraic laws to perform transformations on these systems to analyze their various properties, e.g., efficiency and correctness

To support his assertion that inputs and outputs needed to be considered language primitives, Hoare’s CSP programming language contained primitives to model input and output, or communication, between processes correctly (this is where the paper’s name comes from). Hoare applied the term processes to any encapsulated portion of logic that required input to run and produced output other processes would consume. Hoare probably could have used the word “function” were it not for the debate on how to structure programs occurring in the community when he wrote his paper.

For communication between the processes, Hoare created input and output commands: ! for sending input into a process, and ? for reading output from a process. Each command had to specify either an output variable (in the case of reading a variable out of a process) or a destination (in the case of sending input to a process). Sometimes these two would refer to the same thing, in which case the two processes would be said to correspond. In other words, output from one process would flow directly into the input of another process.

History has judged Hoare’s suggestion to be correct; however, it’s interesting to note that before Go was released, few languages brought support for these primitives into the language. Most popular languages favor sharing and synchronizing access to the memory to CSP’s message-passing style. There are exceptions, but unfortunately, these are confined to languages that haven’t seen wide adoption. Go is one of the first languages to incorporate principles from CSP in its core, and bring this style of concurrent programming to the masses. Its success has led other languages to attempt to add these primitives as well.

`Goroutine`

The Goroutines are one of the most basic units of organization in a Go program, so it’s important we understand what they are and how they work. Every Go program has at least one goroutine: the main goroutine, which is automatically created and started when the process begins. In almost any program you’ll probably find yourself reaching for a goroutine sooner or later to assist in solving your problems. So what are they?

Goroutines are unique to Go (though some other languages have a concurrency primitive that is similar). They’re not OS threads, and they’re not exactly green threads—threads that are managed by a language’s runtime—they’re a higher level of abstraction known as coroutines. Coroutines are simply concurrent subroutines (functions, closures, or methods in Go) that are nonpreemptive—that is, they cannot be interrupted. Instead, coroutines have multiple points throughout which allow for suspension or reentry.

What makes goroutines unique to Go is their deep integration with Go’s runtime. Goroutines don’t define their suspension or reentry points; Go’s runtime observes the runtime behavior of goroutines and automatically suspends them when they block and then resumes them when they become unblocked. In a way, this makes them preemptable, but only at points where the goroutine has become blocked. It is an elegant partnership between the runtime and a goroutine’s logic. Thus, goroutines can be considered a special class of coroutine.

Coroutines, and thus goroutines, are implicitly concurrent constructs, but concurrency is not a property of a coroutine: something must host several coroutines simultaneously and allow each to execute—otherwise, they wouldn’t be concurrent.

Go’s mechanism for hosting goroutines is an implementation of what’s called an M: N scheduler, which means it maps M green threads to N OS threads. Goroutines are then scheduled onto the green threads. When we have more goroutines than green threads available, the scheduler handles the distribution of the goroutines across the available threads and ensures that when these goroutines become blocked, other goroutines can be run.

The go statement is how Go performs a fork, and the forked threads of execution are goroutines. Let’s return to our simple goroutine example:

sayHello := func() {
fmt.Println("hello")
}
go sayHello()

Here, the sayHello function will be run on its goroutine, while the rest of the program continues executing. In this example, there is no join point. The goroutine executing sayHello will simply exit at some undetermined time in the future, and the rest of the program will have already continued executing.

There is one problem with this example: as written, it’s undetermined whether the sayHello the function will ever be run at all. The goroutine will be created and scheduled with Go’s runtime to execute, but it may not get a chance to run before the main goroutine exits.

Indeed, because we omit the rest of the main function for simplicity, when we run this small example, it is almost certain that the program will finish executing before the goroutine hosting the call to sayHello is ever started. As a result, you won’t see the word “hello” printed to stdout. You could put a time.Sleep after you create the goroutine, but recall that this doesn’t actually create a join point, only a race condition.

To a create a join point, you have to synchronize the main goroutine and the sayHello goroutine. This can be done in a number of ways.

import (
	"fmt"
	"sync"
)
func main() {

	sayhello := func() {
		defer wg.Done()
		fmt.Print("hello")
	}
	go sayhello()
	wg.Add(1)
	wg.Wait()
}

`sync`

The sync package contains the concurrency primitives that are most useful for low- level memory access synchronization.

`WaitGroup`

WaitGroup is a great way to wait for a set of concurrent operations to complete when you either don’t care about the result of the concurrent operation, or you have other means of collecting their results. If neither of those conditions is true, I suggest you use channels and a select statement instead. WaitGroup is so useful, I’m introducing it first so I can use it in subsequent sections. Here’s a basic example of using a WaitGroup to wait for goroutines to complete:

import (
	"fmt"
	"sync"
)
func main() {
	var wg sync.WaitGroup
	wg.Add(1)
	go func() {
		defer wg.Done()
		fmt.Print("1")
	}()

	wg.Add(1)
	go func() {
		defer wg.Done()
		fmt.Print("2")
	}()

	wg.Add(1)
	go func() {
		defer wg.Done()
		fmt.Print("3")
	}()
	wg.Wait()
}

The output will differ probably in your machine.

`Mutex` and `RWMutex`

Mutex stands for “mutual exclusion” and is a way to guard critical sections of your program. A critical section is an area of your program that requires exclusive access to a shared resource. A Mutex provides a concurrent-safe way to express exclusive access to these shared resources. To borrow a Goism, whereas channels share memory by communicating, a Mutex shares memory by creating a convention developer must follow to synchronize access to the memory. You are responsible for coordinating access to this memory by guarding access to it with a mutex. Here’s a simple example of two goroutines that are attempting to increment and decrement a common value; they use a Mutex to synchronize:

func main() {
	var count int
	var lock sync.Mutex
	increment := func() {
		lock.Lock()
		defer lock.Unlock()
		count++
		fmt.Printf("Incrementing: %d\n", count)
	}
	decrement := func() {
		lock.Lock()
		defer lock.Unlock()
		count--
		fmt.Printf("Decrementing: %d\n", count)
	}
	// Increment
	var arithmetic sync.WaitGroup
	for i := 0; i <= 5; i++ {
		arithmetic.Add(1)
		go func() {
			defer arithmetic.Done()
			increment()
		}()
	}

	// Decrement
	for i := 0; i <= 5; i++ {
		arithmetic.Add(1)
		go func() {
			defer arithmetic.Done()
			decrement()
		}()
	}
	arithmetic.Wait()
	fmt.Println("Arithmetic complete.")
}

Critical sections are so named because they reflect a bottleneck in your program. It is somewhat expensive to enter and exit a critical section, and so generally people attempt to minimize the time spent in critical sections. One strategy for doing so is to reduce the cross-section of the critical section. There may be a memory that needs to be shared between multiple concurrent processes, but perhaps not all of these processes will read and write to this memory. If this is the case, you can take advantage of a different type of mutex: sync.RWMutex. The sync.RWMutex is conceptually the same thing as a Mutex: it guards memory access; however, RWMutex gives you a little bit more control over the memory.

`Cond`

The comment for the Cond type does a great job of describing its purpose:

…a rendezvous point for goroutines waiting for or announcing the occurrence of an event.

In that definition, an “event” is any arbitrary signal between two or more goroutines that carries no information other than the fact that it has occurred. Very often you’ll want to wait for one of these signals before continuing execution on a goroutine. If we were to look at how to accomplish this without the Cond type, one naive approach is to use an infinite loop:

for conditionTrue() == false {
}

However, this would consume all cycles of one core. To fix that, we could introduce a time.sleep

time.Sleep:
for conditionTrue() == false {
    time.Sleep(1*time.Millisecond)
}

This is better, but it’s still inefficient, and you have to figure out how long to sleep for: too long, and you’re artificially degrading performance; too short, and you’re unnecessarily consuming too much CPU time. It would be better if there were some kind of a way for a goroutine to efficiently sleep until it was signaled to wake and check its condition. This is exactly what the Cond type does for us. Using a Cond, we could write the example like:

c := sync.NewCond(&sync.Mutex{})
c.L.Lock()
for conditionTrue() == false {
    c.Wait()
}
c.L.Unlock()

Let’s expand on this example and show both sides of the equation: a goroutine that is waiting for a signal, and a goroutine that is sending signals. Say we have a queue of fixed length 2, and 10 items we want to push onto the queue. We want to enqueue items as soon as there is room, so we want to be notified as soon as there’s room in the queue. Let’s try using a Cond to manage this coordination:

c := sync.NewCond(&sync.Mutex{})
queue := make([]interface{}, 0, 10)
removeFromQueue := func(delay time.Duration) {
    time.Sleep(delay)
    c.L.Lock()
    queue = queue[1:]
    fmt.Println("Removed from queue")
    c.L.Unlock()
    c.Signal()
}
for i := 0; i < 10; i++{
    c.L.Lock()
    for len(queue) == 2 {
        c.Wait()
    }
    fmt.Println("Adding to queue")
    queue = append(queue, struct{}{})
    go removeFromQueue(1*time.Second)
    c.L.Unlock()
}

We also have a new method in this example, Signal. This is one of two methods that the Cond type provides for notifying goroutines blocked on a Wait call that the condition has been triggered. The other is a method called Broadcast. Internally, the run-time maintains a FIFO list of goroutines waiting to be signaled; Signal finds the goroutine that’s been waiting for the longest and notifies that, whereas Broadcast sends a signal to all goroutines that are waiting. Broadcast is arguably the more interesting of the two methods as it provides a way to communicate with multiple goroutines at once.

KILL `Brodcast`

`Once`

Once is an object that will perform exactly one action. Sounds simple enough, what’s so useful about it then?

Well for some reason this isn’t particularly well documented, but a sync.Once will wait until the execution of .Do completes. This makes it incredibly useful when performing relatively expensive operations that you would typically cache in a map.

Say for example you have a popular website that hits a backend API that isn’t particularly fast, so you decide to cache API results in-memory with a map. A naive solution might look like this:

package main

var cache = make(map[string][]byte)
var mutex = new(sync.Mutex)

func DoQuery(name string) []byte {
    // Check if the result is already cached.
    mutex.Lock()
    if cached, found := cache[name]; found {
        mutex.Unlock()
        return cached, nil
    }
    mutex.Unlock()

    // Make the request if it's uncached.
    resp, err := http.Get("https://upstream.api/?query=" + url.QueryEscape(name))
    // Error handling and resp.Body.Close omitted for brevity.
    result, err := ioutil.ReadAll(resp)

    // Store the result in the cache.
    mutex.Lock()
    cache[name] = result
    mutex.Unlock()

    return result
}

Well what happens if there are two calls to DoQuery that happen simultaneously? The calls would race, neither would see the cache is populated, and both would perform the HTTP request to upstream.api unnecessarily, when only one would need to complete it.

Well, you can do something like this:

package main

type CacheEntry struct {
    data []byte
    wait <-chan struct{}
}

var cache = make(map[string]*CacheEntry)
var mutex = new(sync.Mutex)

func DoQuery(name string) []byte {
    // Check if the operation has already been started.
    mutex.Lock()
    if cached, found := cache[name]; found {
        mutex.Unlock()
        // Wait for it to complete.
        <-cached.wait // remember that when a channel is closed, you get the zero value without blocking.
        return cached.data, nil
    }

    wait := make(chan struct{})

    entry := &CacheEntry{
        data: result,
        wait: wait,
    }
    cache[name] = entry
    mutex.Unlock()

    // Make the request if it's uncached.
    resp, err := http.Get("https://upstream.api/?query=" + url.QueryEscape(name))
    // Error handling and resp.Body.Close omitted for brevity
    entry.data, err = ioutil.ReadAll(resp)

    // Signal that the operation is complete, receiving on closed channels returns immediately.
    close(wait)

    return result
}

That’s good and all but the code’s readability has taken a hit. It’s not immediately clear what’s going on with cached.wait and the flow of operations under different situations is not very intuitive.

Let’s try again with sync.Once:

package main

type CacheEntry struct {
    data []byte
    once *sync.Once
}

var cache = make(map[string]*CacheEntry)
var mutex = new(sync.Mutex)

func DoQuery(name string) []byte {
    mutex.Lock()
    entry, found := cache[name]
    if !found {
        // Create a new entry if one does not exist already.
        entry = &CacheEntry{
            once: new(sync.Once),
        }
        cache[name] = entry
    }
    mutex.Unlock()

    entry.once.Do(func() {
        resp, err := http.Get("https://upstream.api/?query=" + url.QueryEscape(name))
        // Error handling and resp.Body.Close omitted for brevity
        entry.data, err = ioutil.ReadAll(resp)
    })

    return entry.data
}

That’s it. This achieves the same as the previous example, but is now much easier to understand (at least in my opinion). There is only a single return, and the code flows intuitively from top to bottom without having to read and understand what’s going on with the entry.wait channel as before.

Concurrency Patterns in Go

We’ve explored the fundamentals of Go’s concurrency primitives and discussed how to properly use these primitives. In this chapter, we’ll do a deep-dive into how to compose these primitives into patterns that will help keep your system scalable and maintainable.

However, before we get started, we need to touch upon the format of some of the pat‐ terns contained in this chapter. In a lot of the examples, we’ll be using channels that pass empty interfaces (interface{}) around.

Confinement

When working with concurrent code, there are a few different options for safe operation. We’ve gone over two of them:

Synchronization primitives for sharing memory (e.g., sync.Mutex)
Synchronization via communicating (e.g., channels)

However, there are a couple of other options that are implicitly safe within multiple concurrent processes:

Immutable data
Data protected by confinement

In some sense, immutable data is ideal because it is implicitly concurrent-safe. Each concurrent process may operate on the same data, but it may not modify it. If it wants to create new data, it must create a new copy of the data with the desired modifications. This allows not only a lighter cognitive load on the developer but can also lead to faster programs if it leads to smaller critical sections (or eliminates them altogether). In Go, you can achieve this by writing code that utilizes copies of values instead of pointers to values in memory.

Confinement is the simple yet powerful idea of ensuring information is only ever available from one concurrent process. When this is achieved, a concurrent program is implicitly safe and no synchronization is needed. There are two kinds of confinement possible: ad hoc and lexical.

Ad Hoc

Ad hoc confinement is when you achieve confinement through a convention whether it be set by the languages community, the group you work within, or the code-base you work within. In my opinion, sticking to convention is difficult to achieve on projects of any size unless you have tools to perform static analysis on your code every time someone commits some code. Here’s an example of ad hoc confinement that demonstrates why:

import "fmt"
func main() {
	data := make([]int, 4)
	loopData := func(handleData chan<- int) {
		defer close(handleData)
		for i := range data {
			handleData <- data[i]
		}
	}
	handleData := make(chan int)
	go loopData(handleData)
	for num := range handleData {
		fmt.Println(num)
	}
}

We can see that the data slice of integers is available from both the loopData function and the loop over the handleData channel; however, by convention we’re only accessing it from the loopData function.

Lexical

Lexical confinement involves using lexical scope to expose only the correct data and concurrency primitives for multiple concurrent processes to use. It makes it impossible to do the wrong thing.

package main

import (
	"fmt"
	"time"
)

func main() {
	chanOwner := func() <-chan int {
		results := make(chan int, 5)
		go func() {
			defer close(results)

			for i := 0; i <= 5; i++ {
				time.Sleep(1 * time.Second)
				results <- i
			}
		}()

		return results
	}

	consumer := func(results <-chan int) {
		fmt.Println(results)

		for result := range results {
			fmt.Printf("Received: %d\n", result)
		}

		fmt.Println("Done receiving!")
	}

	fmt.Println("initiate channel")
	results := chanOwner()

	fmt.Println("consumer ready - OK")
	consumer(results)
}

Why pursue confinement if we have synchronization available to us? The answer is improved performance and reduced cognitive load on developers. Synchronization comes with a cost, and if you can avoid it you won’t have any critical sections, and therefore you won’t have to pay the cost of synchronizing them. Note that lexical might abuse the memory usage in this example; for larger types (which should be passed as pointers)

For-Select

Obviously.

Preventing Goroutine Leaks

When you are dealing with a memory leak in Go, there is a possability by 99% that your dealing with goroutines leak.

The goroutine has a few paths to termination:

When it has completed its work.
When it cannot continue its work due to an unrecoverable error.
When it’s told to stop working.

We get the first two paths for free—these paths are your algorithm—but what about work cancellation? This turns out to be the most important bit because of the network effect: if you’ve begun a goroutine, it’s most likely cooperating with several other goroutines in some sort of organized fashion. We could even represent this interconnectedness as a graph: whether or not a child goroutine should continue executing might be predicated on knowledge of the state of many other goroutines. The parent

goroutine (often the main goroutine) with this full contextual knowledge should be able to tell its child goroutines to terminate. Let’s start with a simple exam ple of a goroutine leak:

func main() {
	doWork := func(strings <-chan string) <-chan interface{} {
		completed := make(chan interface{})
		go func() {
			defer fmt.Println("doWork exited.")
			defer close(completed)
			for s := range strings {
				// Do something interesting
				fmt.Println(s)
			}
		}()
		return completed
	}
	doWork(nil)
	// Perhaps more work is done here
	fmt.Println("Done.")
	fmt.Println(runtime.NumGoroutine())
}

Here we see that the main goroutine passes a nil channel into doWork. Therefore, the strings channel will never actually gets any strings written onto it, and the goroutine containing doWork will remain in memory for the lifetime of this process (we would even deadlock if we joined the goroutine within doWork and the main goroutine).

In this example, the lifetime of the process is very short, but in a real program, goroutines could easily be started at the beginning of a long-lived program. In the worst case, the main goroutine could continue to spin up goroutines throughout its life, causing creep in memory utilization.

The way to successfully mitigate this is to establish a signal between the parent goroutine and its children that allows the parent to signal cancellation to its children. By convention, this signal is usually a read-only channel named done. The parent goroutine passes this channel to the child goroutine and then closes the channel when it wants to cancel the child goroutine. Here’s an example:

func main() {
	doWork := func(
		done <-chan interface{},
		strings <-chan string,
	) <-chan interface{} {
		terminated := make(chan interface{})
		go func() {
			defer fmt.Println("doWork exited.")
			defer close(terminated)
			for {
				select {
				case s := <-strings:
					// Do something interesting
					fmt.Println(s)
				case <-done:
					return
				}
			}
		}()
		return terminated
	}
	done := make(chan interface{})
	terminated := doWork(done, nil)
	go func() {
		// Cancel the operation after 1 second.
		time.Sleep(1 * time.Second)
		fmt.Println("Canceling doWork goroutine...")
		close(done)
	}()
	<-terminated
	fmt.Println("Done.")
}

Notes

It is useful to use the buffered channel, but only when you have a good reason to (since they might help hiding race conditions)

KILL Rendezvous

Footnotes:

There is an accepted proposal to allow the runtime to detect partial deadlocks, but it has not been implemented. For more information, see https://github.com/golang/go/issues/13759.

Note that lexical might abuse the memory usage in this example; for larger types (which should be passed as pointers)

لّermontov