Hashicorp Plugin System Design and Implementation

14 min readMar 5, 2022

When we are using Hashicorp products, we can find that there’s always a plugin system that users can develop for their own and enables them to extend the functionality of the product. One of the famous products, Terraform, we can not only use well-known cloud provider modules such as AWS, GCP but our own customed modules. Vault, there are many built-in secret engines, but we can develop our own and by attaching it to Vault, we can extend the way to save our secret.

Likewise, each product provides the interface to users for what methods must be implemented to be the plugin of it. And if users implement the interface, they can easily extend the functionality by registering it to the product.

In this post, we are going to see the Hashicorp plugin system which is generally used in their products.

In the ‘Hashicorp Plugin Abstract Behavior’ section, we are going to look at the overview of how the plugin system works. In the ‘Hashicorp Plugin Design’ section, how the plugin system is designed and the characteristic of the system what I’m personally thinking. In the ‘Hashicorp Plugin Implementation’ section, how the plugin is implemented in detail. In the ‘Hashicorp Behavior in Terraform’ section, how plugin system works in Terraform context. Lastly in the ‘etc’ section, a few things that I’m impressed with when I look at the code of plugin modules.

Hashicorp Plugin Abstract Behavior

The main service host which uses the Hashicorp plugin system executes the plugin service and makes it a process. The main service communicates with the plugin service through RPC. Because the main service runs the plugin as a child process, if the plugin service is killed, it has no impact on the main services.

There’re two types of RPC client/server that Hashicorp supports. One is Golang native net/tcp package, the other is gRPC. All Hashicorp products are using gRPC client/server for transporting data between the main service and the plugin service. I think it's because by using gRPC there's no restriction on the programming language users should use to implement the plugin, and gRPC is battle-tested so far with lots of use-cases so we can ensure its stability and performance.

In this post, we are going to use net/tcp as an example. Because setup and testing are a little easier and there's no problem with understanding the big picture of the plugin system. For this post, I just refactor some codes, the project name of powerstrip, by removing unnecessary parts and dependencies. So we can easily run it and see the core parts.

Plugin system, when the main service starts, it exec the binary of the plugin services. And two process shares the file descriptor created by the main service and communicate by using a Unix domain socket. And they use RPC protocol.

In Hashicorp plugin system, main-service exec plugin-service and communicates it with RPC — Fig. 1. main-service exec plugin-service and communicates it with RPC in the plugin system

Hashicorp Plugin Design

Let’s take a deeper. To make a plugin, the plugin developer needs to implement the following interface.

type Plugin interface {
    Server(*MuxBroker) (interface{}, error)
    Client(*MuxBroker, *rpc.Client) (interface{}, error)
}

There’s must be a service that the plugin provides, so the plugin should return the service server and client to the users. Let’s say there’s a service Greeter. This service print message to the client. Plugin developers should implement the plugin which returns the client and server. The server returned from the plugin knows how to server the Greeter service and the client know how to request to the server for using the service.

type Greeter interface {
    Greet() string
}

GreeterPlugin knows how to create server and client of the service: GreeterRPCServer, GreeterRPC. When GreeterRPCServer is received the request, then the server calls the service and responds to the client with the value returned from service. And GreeterRPC needs to make requests and error handling.

Maybe there’s a question who should write GreeterRPCServer and GreeterRPC logic.

import (
    "net/rpc"
)// GreeterRPC is the client of the Greeter service
type GreeterRPC struct {
    client *rpc.Client
}func (g *GreeterRPC) Greet() string {
    var resp string
    err := g.client.Call("Plugin.Greet", new(interface{}), &resp)
    if err != nil {
        panic(err)
    }
    return resp
}// GreeterRPCServer is the server of the Greeter service
type GreeterRPCServer struct {
    Impl Greeter
}func (s *GreeterRPCServer) Greet(args interface{}, resp *string) error {
    *resp = s.Impl.Greet()
    return nil
}// GreeterPlugin is the plugin which enables the user to use the Greeter service
type GreeterPlugin struct {
    Impl Greeter
}func (p *GreeterPlugin) Server(*powerstrip.MuxBroker) (interface{}, error) {
    return &GreeterRPCServer{Impl: p.Impl}, nil
}func (GreeterPlugin) Client(b *powerstrip.MuxBroker, c *rpc.Client) (interface{}, error) {
    return &GreeterRPC{client: c}, nil
}

So far we’ve looked at it from the perspective of the plugin system. Let’s take a look from the product developer's view. Now we are calling this plugin as plugin ‘module’, not as ‘system’. Product, in this case, means the services provided to the client and one of the service functionality can be extended through the plugin. The plugin service provider who may be the team in the same product group or in the other corps implements the service interface and it can be attached to the product easily. For example, Terraform, there is an interface for being a Terraform resource. Developers who may want to create their own Terraform resource need to implement an interface and use Terraform SDK which knows how to make plugins with the given implementations it can be a custom Terraform resource.

Likewise, the product developer provides an interface to the provider and those providers implement the interface to use their own functionality while using the product. But it seems like it is just interface and implementation relation. There’re lots of examples one provides an interface and others implements implementations and those implementations can be exchanged easily. But using RPC makes the difference.

In Fig. 2, the provider implements the interface and those implements are dependency injection(DI)-ed to the application. This is just the case I told. We can see this pattern in lots of applications. And from the application and the service implementation view, when the application is built these implementations are built too and the opposite is also true. In other words, it can be hard to separate the lifecycle of the product code from the plugin service code.

In Fig. 3, when using the plugin system, the application runs the binary of the service implementations there’s no hard coupling of the lifecycle. Also as we’ve seen before, if there’s some problem with the plugin service, it has no impact on the whole product and the other plugin services.

This kind of design seems useful when the lifecycle of the product team and plugin service team is different and when there’re silos between teams. On the other hand, if both teams can work closely together or if the plugin is critical to the application, this kind of design makes increases test complexity and time.

When we think about Terraform, cloud enterprise like AWS and GCP has well-known their own cloud domain knowledge. And because there’re silos between Hashicorp and cloud enterprises, it’s very hard to work on single code bases. So Hashicorp provides the way to extend the functionality easily through the plugin system and the cloud enterprises with the relatively low cost can leverage useful IoC concepts so that they help the client to easily manage the cloud resources.

Fig. 2. Service is dependency-injected to Application

Fig. 3. Application exec service binary and communicate with it

Hashicorp Plugin Implementation

Now, let’s look at the plugin system implementation. We will not cover all the details but a few important things that I’m personally thinking about. First, take a look at where a simple plugin is used.

In this example, the plugin service Greeter 's implementation is GreeterHello struct. For now, the implementation is simple but it can be hundreds or thousands of lines of code.

GreeterHello implementation is put to plugin map and it is served by plugin framework. 'powerstrip' package is as I've told before it is a simplified version of Hashicorp plugin modules. In the plugin map, the key of the plugin is the id of the plugin. In this example it is "greeter" . The client lookup the plugin with this id.

After building this code with go build -o greeter, we make a plugin service and this is plugin-service in Fig. 1.

type GreeterHello struct{}func (g *GreeterHello) Greet() string {
    return "Hello!"
}func main() {
    greeter := &GreeterHello{}var pluginMap = map[string]powerstrip.Plugin{
        "greeter": &common.GreeterPlugin{Impl: greeter},
    }
    powerstrip.Serve(&powerstrip.ServeConfig{
        Plugins: pluginMap,
    })
}

Next is using the plugin part. As same in the serving the plugin, we make a plugin map with the same key and put his map in the ClientConfig. And in Cmd the field specified the path of the plugin service binary we build before. Client when starts its process, take the binary of the plugin service and exec it as its child process after that client lookup the plugin service with the key and calls the service.

These steps are done by running Protocol(), Dispense() methods.

Protocol() determines how the client process communicates with the server process. It can return either net/rpc package client or gRPC client. In powerstrip, it only returns net/rpc clients. Inside Protocol() it execs the server process and prepares to connect with RPC.

Next, with Dispense(), the client can retrieve the plugin with the id. In the example, the client gets the Greeter plugin service with the id of "greeter" and calls the service.

var pluginMap = map[string]powerstrip.Plugin{
    "greeter": &common.GreeterPlugin{},
}func main() {
    client := powerstrip.NewClient(&powerstrip.ClientConfig{
        Plugins: pluginMap,
        Cmd:     exec.Command("./plugin/greeter"),
    })
    defer client.Kill()rpcClient, err := client.Protocol()
    if err != nil {
        log.Fatal(err)
    }raw, err := rpcClient.Dispense("greeter")
    if err != nil {
        log.Fatal(err)
    }greeter := raw.(common.Greeter)
    fmt.Println(greeter.Greet())
}

How client process create server process and connects to it

Now we are going to how the client creates a server process and connects to it.

When the client calls Protocol(), it calls Start() and RPC client tries to connect to the server. So after Start(), the server process already is created and is in a listening state.

func (c *Client) Protocol() (ClientProtocol, error) {
    _, err := c.Start()
    ...
    c.proto, err = newRPCClient(c)
  ...
    return c.proto, nil
}func newRPCClient(c *Client) (*RPCClient, error) {
    conn, err := net.Dial(c.addr.Network(), c.addr.String())
  ...
}

Start() code snippet is a little bit longer, but these are needed for understanding how the client creates the server and connects to it.

First, we take the server process’s stdout, stderr then exec by cmd.Start(). After that client runs two goroutines. one is for the waiting server process to be exited. This is not only for updating the flag to check the status of the server but for handling the case when the server is exited before the client tries to connect to it. You can find this logic in select statement <-c.doneCtx.Done() case. the other is for reading the characters print from the stdout of the server process. The client reads characters then it passes them to linesCh channel. The server sends its address where to dial to the client. In select statement line := <-linesCh case, the client gets the server address.

func (c *Client) Start() (net.Addr, error) {
    cmdStdout, err := cmd.StdoutPipe()
    ...
    cmdStderr, err := cmd.StderrPipe()
    ...
    err = cmd.Start()
    ...
    c.proc = cmd.Process    c.doneCtx, c.ctxCancel = context.WithCancel(context.Background())    // when server process exit, client updates its flag
    c.clientWg.Add(1)
    go func() {
        defer c.ctxCancel()
        defer c.clientWg.Done()
        ...
        err := cmd.Wait()
        ...
        c.exited = true
    }()    // client reads the character printed from stdout of server process
    linesCh := make(chan string)
    c.clientWg.Add(1)
    go func() {
        defer c.clientWg.Done()
        defer close(linesCh)
 
        sc := bufio.NewScanner(cmdStdout)
        for sc.Scan() {
            linesCh <- sc.Text()
        }
    }()    select {
    case <-timeout:
        // return with error
    case <-c.doneCtx.Done():
        // return with error
    case line := <-linesCh:
        // get the server address `addr`
    }
    c.addr = addr
    return addr, nil
}

Next, we are going to see how the server passes its address to connect to the client.

In serverListener(), the server creates a file for communicating with the client with Unix socket, other than Unix socket server can communicate with the client by TCP.

By fmt.Printf(), server prints "Protocol|Address" format string to its stdout. These characters are sent to the client, and the client tries to connect with this address.

func Serve(opts *ServeConfig) {
    ...
    lis, err := serverListener()
    ...
    server := &RPCServer{
        Plugins: opts.Plugins,
        Stdout:  stdoutReader,
        Stderr:  stderrReader,
        DoneCh:  doneCh,
    }
    ...
    // prints "Protocol|Address" format characters to stdout
    fmt.Printf("%s|%s\n",
        lis.Addr().Network(),
        lis.Addr().String())
    os.Stdout.Sync()
    ...
    go server.Serve(lis)
}func serverListener() (net.Listener, error) {
    tf, err := ioutil.TempFile("", "plugin")
    ...
    path := tf.Name()
    ...
    l, err := net.Listen("unix", path)
    ...
}func (c *Client) Start() (net.Addr, error) {
    ...
    select {
    case <-timeout:
        // return with error
    case <-c.doneCtx.Done():
        // return with error
    case line := <-linesCh:
        line = strings.TrimSpace(line)
        parts := strings.SplitN(line, "|", 2)
        if len(parts) < 2 {
            return nil, fmt.Errorf("", line)
        }
        switch parts[0] {
        case "tcp":
            addr, err = net.ResolveTCPAddr("tcp", parts[1])
        case "unix":
            addr, err = net.ResolveUnixAddr("unix", parts[1])
        default:
            err = fmt.Errorf("Unknown address type: %s", parts[0])
        }
    }
}

How Client Select Plugin and Call the Method

So far, we’ve seen how the client creates a server process and connects to it. Now we’re going to see the part where the client gets the plugin what it needs (The code of this section is actually not needed if you are using gRPC protocol because most of the codes are generated through the protobuf, but let’s get the feeling of it).

We get the RPCClient by calling Protocol() method. RPCClient can get the plugin what it needs by calling Dispense() method. Inside of Dispense(), first, it checks whether the plugin with the name exists then calls to the RPC server with the method of name "Dispenser.Dispense" and gets the id of type uint32. With this id, it gets the client of the plugin and returns.

func (c *RPCClient) Dispense(name string) (interface{}, error) {
    p, ok := c.plugins[name]
    if !ok {
        return nil, fmt.Errorf("unknown plugin type: %s", name)
    }
    var id uint32
    if err := c.control.Call(
        "Dispenser.Dispense", name, &id); err != nil {
        return nil, err
    }    conn, err := c.broker.Dial(id)
    if err != nil {
        return nil, err
    }    return p.Client(c.broker, rpc.NewClient(conn))
}

You can get the whole picture when you see the other side: server.

In ServerConn(), you can find the code of registering dispenseServer's methods with the prefix of "Dispenser". These methods are called from the client to get the plugin.

If you see the dispenseServer.Dispense() method, it tries to find the plugin with the name, and gets the implementation, then again register plugin's methods with the prefix of "Plugin".

In this post, I will not go deep inside of the MuxBroker, but actually, this is also a really interesting thing! Hashicorp creates its own multiplexing protocol. It multiplexes single connection with the unit of Stream and the MuxBroker uses this protocol. In a single connection, it allocates the id of the stream. And those streams with the unique id can transport the data without interference so it makes us feel we are communicating with the others with multiple connections. If you are curious about the details, you can find it here.

func (s *RPCServer) Serve(lis net.Listener) {
    for {
        conn, err := lis.Accept()
        ...
        go s.ServeConn(conn)
    }
}func (s *RPCServer) ServeConn(conn io.ReadWriteCloser) {
    ...
    server := rpc.NewServer()
    ...
    server.RegisterName("Dispenser", &dispenseServer{
        broker:  broker,
        plugins: s.Plugins,
    })
    server.ServeConn(control)
}type dispenseServer struct {
    broker  *MuxBroker
    plugins map[string]Plugin
}func (d *dispenseServer) Dispense(name string, response *uint32) error {
    p, ok := d.plugins[name]
    ...
    impl, err := p.Server(d.broker)
    ...
    id := d.broker.NextId()
    *response = id
    ...
    go func() {
        conn, err := d.broker.Accept(id)
        ...
        serve(conn, "Plugin", impl)
    }()    return nil
}func serve(conn io.ReadWriteCloser, name string, v interface{}) {
    server := rpc.NewServer()
    err := server.RegisterName(name, v)
    ...
    server.ServeConn(conn)
}

Hashicorp Plugin Behavior in Terraform

How this plugin module is actually used in the Hashicorp products? Let’s see the famous one, Terraform.

All the resources, data types in Terraform are created by providers. The provider creates its own resource by using the plugin framework provided by Terraform. These resources are plugin-service and Terraform is main-service in the perspective of Fig. 1.

Each provider writes how to CRUD its resources using Terraform plugin SDK.

This image is from Terraform's official tutorial page.

Let’s see how custom resources are created with the example of Hashicups. In main.go file, with the hashicups.New() function it creates tfsdk.Provider. And terraform-plugin-go wraps the tfsdk.Provider plugin service and registers it to the plugin. Although the code is using gRPC type, isn't it quite similar to the code above?

In this way, the provider creates plugin-service in Fig. 1 using the framework.

(It is also quite interesting if you see the relation between terraform-plugin-framework and terraform-plugin-go, in terraform-plugin-framework, it works as an adapter between the provider side and product core side and do the error handling or complex configuration things)

The below is the code of the user’s custom plugin module code which uses ‘terraform-plugin-framework’

func main() {
    tfsdk.Serve(context.Background(), hashicups.New, tfsdk.ServeOpts{
        Name: "hashicups",
    })
}

Below is the code of ‘terraform-plugin-framework’ which uses ‘terraform-plugin-go’

func Serve(ctx context.Context, factory func() Provider, opts ServeOpts) error {
    return tf6server.Serve(opts.Name, func() tfprotov6.ProviderServer {
        return &server{
            p: factory(),
        }
    })
}

Below is the code of ‘terraform-plugin-go’

func Serve(name string, serverFactory func() tfprotov6.ProviderServer, opts ...ServeOpt) error {
    serveConfig := &plugin.ServeConfig{
        ...
        Plugins: plugin.PluginSet{
            "provider": &GRPCProviderPlugin{
                GRPCProvider: serverFactory,
            },
        },
        GRPCServer: plugin.DefaultGRPCServer,
    }
    ...
    plugin.Serve(serveConfig)
    return nil
}

The code of Terraform is huge so it’s hard to take all the codes and the readability is terrible. So I just linked the code and describe the flow of Terraform.

When the client runs terraform plan things, Terraform gets the state of modules from the backend and creates the context. In the context, there's a provider map with the key of the resource type. And the provider knows how to CRUD of that resource. And with the context and state, Terraform make the graph.

GraphWalker walk() the graph and by the graph node type, it calls the callback. And if the node is executable which is enabled to plan, destroy, apply, import things, the task is run with the context of that node.

When the task is run, it takes the provider by the resource type and calls the plugin interface what it needs.

ETC

Testing Client

One interesting thing when I see the code of the plugin module is the test code. I was just curious how to create server process when testing the client. And the answer is to create the test double of the server process with the “test case”.

When writing the test code, we call helperProcess(). And inside of helperProcess(), it creates cmd which runs specific test code. You can see the code below which creates cmd with the args of -test.run=TestHelperProcess and that runs TestHelperProcess test case.

TestHelperProcess test case defines how to work as server process by the type. For example, if the type is "mock", it prints "tcp|:1234" to the stdout, and in this case, the test code can test whether the client can connect to the server successfully.

func TestClient(t *testing.T) {
    proc := helperProcess("mock")    c := NewClient(&ClientConfig{
        Cmd:     proc,
        Plugins: testPluginMap,
    })
    defer c.Kill()    addr, err := c.Start()
    ...
}

The code below is the test helper function and TestHelperProcess both would be used in client test code

func helperProcess(s ...string) *exec.Cmd {
    cs := []string{"-test.run=TestHelperProcess", "--"}
    cs = append(cs, s...)
    env := []string{
        "GO_WANT_HELPER_PROCESS=1",
    }
    cmd := exec.Command(os.Args[0], cs...)
    cmd.Env = append(env, os.Environ()...)
    return cmd
}func TestHelperProcess(t *testing.T) {
    if os.Getenv("GO_WANT_HELPER_PROCESS") != "1" {
        return
    }
    ...
    cmd, args := args[0], args[1:]
    switch cmd {
    case "mock":
        fmt.Printf("tcp|:1234\n")
        <-make(chan int)
    ...
}