Activities Compared to Threads, Event Handlers, etc.

Charcoal introduces a new interactive programming primitive called activities. As cooperative threads are a kind of compromise between events and threads, activities are a kind of compromise between preemptive and cooperative threads. From an implementation perspective, activities are cooperative threads with some implicit yields thrown in here and there. From a developer's perspective, activities feel more like preemptive threads with less insanity about when preemption might happen. The Charcoal designer's hope is that activities combine the software engineering strengths of cooperative and preemptive threads with few of their weaknesses.

Activities Compared to Threads

Probably the most obvious difference between threads and activities is one that I don't think is all that important in the end. Threads can be run in parallel; activities can't. That seems like a hugely consequential difference; how can I say it isn't important? It's because I don't think threads should be used to parallelize most applications anyway, so it's not a uesful feature. Parallelization should be done with task-oriented frameworks like Cilk, Microsoft's PPL, Intel's TBB, Apple's GCD, etc.

So if we set aside parallelization (for example, just pretend we're running everything on a single-processor machine), what's the difference between threads and activities? Threads are fully preemptive, whereas activities are only preemptive at a coarser granularity and it's easy to temporarily "disable interrupts" in a safe way with activities.

Full preemption is the source of much evil for threads. Optimizing compilers and modern architectures like to rearrange memory accesses in all kinds of funky ways, which is why data races are such a thorny problem for high-level language memory models (Java, C/C++). Between activities there are no data races. The language forbids the implementation from making it appear that memory accesses have crossed yield boundaries.

It may seem like the prohibition against memory accesses crossing yield boundaries would be a big performance problem, but I don't think it will be. (This is all very speculative at the moment.) For example, consider loop invariant code motion:

1:
2:
3:
4:
5:
6:
7:
8:
  
  
  
  
  
  
  
  
void function_with_loop( int *p, int *q )
{
    
size_t i;
    
for( i = 0; i < ...; ++i )
    
{
        
... *p ... *q = ...
    
}
}

If a compiler can satisfy itself that p and q aren't aliases for anything else accessed by this code, it would love to transform it into:

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
  
  
  
  
  
  
  
  
  
  
void function_with_loop( int *p, int *q )
{
    
size_t i;
    
int pval = *p, qval = *q;
    
for( i = 0; i < ...; ++i )
    
{
        
... pval ... qval = ...
    
}
    
*q = qval
}

But in Charcoal a for loop really looks like this:

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
  
  
  
  
  
  
  
  
  
  
  
void function_with_loop( int *p, int *q )
{
    
size_t i;
    
int pval = *p, qval = *q;
    
for_no_yield( i = 0; i < ...; ++i )
    
{
        
... pval ... qval = ...
        
yield;
    
}
    
*q = qval
}

So it seems like the loop invariant code motion optimization might violate the rules by making accesses to *p and *q appear to cross yield boundaries. My hope is that the compiler will be able to see into yield and insert fix-up code like this:

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
void function_with_loop( int *p, int *q )
{
    
size_t i;
    
int pval = *p, qval = *q;
    
for_no_yield( i = 0; i < ...; ++i )
    
{
        
... pval ... qval = ...
        
if( "actually yield (very unlikely)" )
        
{
            
*q = qval;
            
actually_yield;
            
pval = *p;
            
qval = *q;
        
}
    
}
    
*q = qval;
}

This does add complexity to the compiler and inflates code size somewhat, but I don't think either of those costs should be huge. It should have very little impact on run time, because actually yielding should happen so infrequently.

Interesting side note: Charcoal actually does have threads, though most application code shouldn't use them most of the time. In Charcoal threads are containers for activities, similarly to how processes are containers for threads in most modern systems. Most activities should be running in a single thread, but there are some reasons for using multiple threads. You need to be super careful. Kind of like signal handling in normal C/C++.

Activities Compared to Cooperative Threads

In terms of implementation, activities are quite similar to cooperative threads.

More text here.

Activities Compared to Goroutines

Goroutines are the primary concurrency primitive in the language Go, designed and implemented by some folks at Google. Out of all the dozens of concurrency primitives that have been proposed and implemented over the years, I decided to compare activities to goroutines first, because they are both interesting hybrids of preemptive and cooperative threads (and Go is getting a fair amount of attention these days).

Naturally, I think activities have some advantages compared to goroutines.

Because goroutines can run in parallel, Go is (in principle) just as exposed to data races as threads in C/C++. I believe the main counterpoint Go enthusiasts would make to this is that Go has a snazzy message passing system and well-written Go programs shouldn't use shared memory much/at all. I do not believe there is anything in the language that actually prevents memory from being shared between goroutines (I'll have to revise this argument heavily if I'm wrong about that).

I don't find this line of thinking compelling for two reasons:

So in my view, shared memory is a fact of life that has to be dealt with. Activities have the following really nice properties:

Activities Compared to Concurrency Primitive X

Lots of variations on the "Big Four" (processes, events, threads, cooperative threads) have been proposed over the years. If you would like to see my take on how activities compare to any of these, let me know. I'll do my best to write something and update here.

More Junk

For Language Designers/Enthusiasts

For Programmers

For Computer Architects

Many researchers of various stripes, but especially architects, assume that all programmers ever wanted from a concurrency framework was sequential consistency (e.g.). It is certainly the case that SC is more intuitive than the monstrous relaxed memory models that have been invading programming language definitions recently (e.g.). However, I think that SC is still too low-level to be useful for many everyday programming situations.

SC says that individual memory operations can be arbitrarily interleaved. This is terrible. Activities are more coarse-grained by default and give the programmer simple tools to exert more control.

Note that I am not trying to take a shot at architecture researchers. Rather, because of the nature of the problem they're interested in (how to program shared-memory multi-processor architectures) they are naturally drawn to (conventional/preemptive) threads, which combine shared memory and parallel execution. By giving up parallelism, we can achieve a much more intuitive concurrency model. (Wait! I care about parallelism!)

TBD

Note: For the time being, the designers of Charcoal are not interested in processor parallelism at all. Concurrency and parallelism are different, and Charcoal is about concurrency.


Creative Commons License
Charcoal Programming Language by Benjamin Ylvisaker is licensed under a Creative Commons Attribution 4.0 International License