create operation: detect "continuation", otherwise fallout such as key material split brain (idempotency violation)

With each create operation, the CLI generates fresh key material early in the process: an RSA key pair, and derived authentication tokens for the data API. It's doing that before doing any remote state inspection.

When the Opstrace cluster already existed before invoking the current create operation, then the create operation will push the new public key into the existing cluster. However, the deployments will not pick it up and subsequently, the CLI will use the new (bad) authentication tokens for probing cluster readiness in the last phase of said create operation.

That will fail with something like

[2020-11-24T20:57:27Z] 2020-11-24T20:57:27.056Z info: https://loki-external.default.bk-2962-d71-a.opstrace.io:8443/loki/api/v1/labels: still waiting, unexpected HTTP response
[2020-11-24T20:57:27Z] 2020-11-24T20:57:27.543Z debug: HTTP response details:
[2020-11-24T20:57:27Z]   status: 401
[2020-11-24T20:57:27Z]   body[:500]: bad authentication token

This is a known limitation not yet documented; and a good reason to create this ticket.

This behavior also certainly a violation of an idempotency constraint that we talk about every now and then (where we think and want the create operation to be idempotent).

For example, from the current quickstart documentation

So you know: The CLI is re-entrant, so if you kill it or it dies, it will pick up right where it left off.

This topic raises so many interesting points!

First-level thinking: we could detect when the k8s cluster & cluster-internal config (controller config) exists; and then do not regenerate key material and data api auth tokens; try to read existing authentication tokens (and fail when they can't be discovered).

Second-level thinking: the previous thought reveals a bigger picture insight: when the cluster already exists then we don't want to overwrite any part of the cluster-internal config state -- that is not well-specified yet.

Third-level thinking: ok, we can explicitly ignore the user-given config, emit a clear warning message, and move on with the Nth create operation on the same cluster; trying to inform the admin via log messages that we just ignored the config they provided.

Fourth-level thinking: but this kind of fallback would be too magic -- we'd be ignoring the user-given config file (or parts of it!) w/o providing a clear signal. No, this must lead to non-zero exit of the current create operation.

Fifth-level thinking: this means the same command run twice (unaltered) can't just magically do the right thing in an idempotent fashion. There's a logical conflict here. Even with proper config upgrade mechanism we should not magically switch between 'initial create' and 'config upgrade'.

What's the value of the idempotency constraint? I think we have to see that it is a little ignorant -- because we have to think about every aspect of the cluster configuration when discussing idempotency or "continuation" of a previous partial create.

Long term: we need to introduce proper cluster config diff and mutation design

We could specify "well" what it means to overwrite the config (and apply all changes). That could resolve this conflict. Each create operation could simply set the current config, including key material. That sounds like a big, interesting project for later. When done properly that does, for example, require to do key rotation in the API proxies w/o downtime.

But note: even with proper config upgrade mechanism we should not magically switch between 'initial create' and 'config upgrade'. That must be an explicit user choice.

Short term: idempotency for cloud infra, pragmatic explicit continuation mode for k8s cluster

We could make the the idempotency constraint a little weaker, and apply it only to the cloud infrastructure setup before doing k8s cluster interaction. I think this would be pragmatic and valuable! That is: implicit continuation, actual idempotency when we talk about cloud infra, not about the k8s cluster state itself.

How to distinguish those cases? We could, in the beginning of the create operation, check if a corresponding k8s/EKS cluster already exists (not limiting the search to the region specified via config file, but across all regions).

Question: do we allow for the same Opstrace cluster name to be re-used across regions in the same cloud account? No. (for now, to have something to base decisions on -- but probably forever; that's just too confusing). (already used elsewhere, but need to write down)

If we do not find such a k8s cluster

Simply continue -- with key material regeneration :) -- and cloud infra creation (in an idempotent fashion, i.e. this will automatically pick up where previously left off).

If we find such a k8s cluster

If it does not have a controller deployment and no Opstrace-specific config map set:

Continue. There's no conflict here (again: idempotency).

If it has a controller deployment and any config map set

Then we should abort, exit non-zero; saying that we don't yet have a config-diff-upgrade mechanism.

Then offer a continuation run with a CLI flag (e.g. --continue) that will then move past this point.

This --continue mode would not accept a cluster config, or does explicitly ignore various parts of the cluster configuration file; and requires successful discovery of authentication token files from a previous create run. To me, that's another argument for: authentication token files should actually always live in a directory as an atomic unit; and we want to introduce a command line flag for discovering that.

I begin to see:

opstrace create aws jpdev --continue [--api-token-dir PATH]

--api-token-dir PATH is for discovery in this case; and it has has a default, matching the corresponding default for writing these files (upon first, happy-path create).
That command above does not require reading a cluster config document. Maybe we still want to do that (so that, so that one can simply apped a --continue for the second cmd invocation (so that one does not need to remove -c or stdin?)

Definite TODO: write authentication tokens to disk only when we are certain that we create new new cluster.

That means: delay writing the authentication tokens (compared to what we do today).

And here we are: Nth level thinking: this create --continue is basically the status command that we already have; or at least it's getting super close. I think: no, let's not do opstrace create aws jpdev --continue [--api-token-dir PATH] -- let's re-think status, maybe call it differently.

opstrace wait aws jpdev [--api-token-dir PATH]

It will find the EKS cluster (look for it in all regions), get the list of tenants from the cluster, and then use the API tokens to do its thing.

This would also resolve my major 'design complaint' about the current status implementation: it requires the config file, but shouldn't: https://github.com/opstrace/opstrace/blob/e44644d78f01659cf7d69ee44d6658a0f9119059/packages/cli/src/status.ts#L59

Misc

Other thoughts triggered by this discussion

For readiness, do not rely on HTTP / data API interaction -- probe readiness with k8s API means only (the proper solution?).

Edited Feb 24, 2022 by Simão Reis