A Complete Guide to Internal PKI with ACME and Certificate Automation

1. Introduction: Why Private CAs Matter Again

For a long time, internal PKI was treated as a necessary evil: complex, manual, brittle, and best avoided if possible. Certificates were issued manually, renewed infrequently, and often shared across systems. This approach barely worked when infrastructure was static. It completely breaks down in modern environments.

Today’s infrastructure is dominated by:

Microservices and APIs communicating over the network
Kubernetes clusters with ephemeral workloads
Service meshes enforcing mutual TLS (mTLS)
Zero Trust architectures that assume no implicit trust

In this world, every workload is an identity, and TLS certificates are no longer just about encryption—they are the primary authentication mechanism.

Public Certificate Authorities (CAs) solved certificate automation for the public internet, but they are fundamentally unsuitable for internal infrastructure. Internal services use private DNS names, IP identifiers, SPIFFE IDs, device identities, and non-public trust roots. They require tighter policy control, shorter lifetimes, and integration with internal identity systems.

This is where Private CAs (such as BastionXP) combined with the ACME protocol become critical. ACME transformed certificate management from a manual, error-prone process into a fully automated system with a rich ecosystem of clients and integrations. When applied to private PKI, ACME becomes more than a certificate protocol—it becomes an identity automation control plane.

2. Fundamentals of PKI

At its core, PKI is about trust delegation.

A Root CA acts as the trust anchor. It is usually generated offline, protected aggressively, and rarely used. Its only job is to sign one or more Intermediate (Issuing) CAs.

Intermediate CAs perform the actual certificate issuance. They are online, operationally accessible, and have a limited blast radius. If compromised, they can be revoked without destroying the entire trust ecosystem.

Certificates themselves follow a lifecycle:

Key generation
Certificate issuance
Distribution and trust
Renewal
Expiration or revocation

In modern PKI, expiration replaces revocation. Short-lived certificates (hours or days) dramatically reduce the need for CRLs and OCSP while improving security posture.

Only a subset of X.509 fields truly matter in practice:

Subject Alternative Name (SAN) is authoritative; the Subject is mostly legacy
SANs may include DNS names, IP addresses, URIs, or SPIFFE IDs
Extended Key Usage (EKU) defines what the certificate is allowed to do
Key Usage (KU) restricts cryptographic operations
Certificate Policies encode higher-level constraints

Trust distribution is as important as issuance. Trust roots must be distributed to:

Operating systems
Container images
Language runtimes
Service meshes
Custom applications

Traditional PKI tooling (OpenSSL, manual CSRs, ad-hoc scripts) provides cryptographic primitives but lacks automation, policy enforcement, and observability—making it unsuitable for modern infrastructure.

3. ACME Protocol: Internals and Flow

ACME (Automated Certificate Management Environment) standardizes certificate issuance and renewal. Its success comes not from cryptographic innovation, but from operational automation.

An ACME system consists of:

A client (e.g., certbot, lego, acme.sh)
A server (public or private)
An account representing the client’s identity
Orders that request certificates
Authorizations that prove control over identifiers
Challenges that validate those authorizations

The ACME lifecycle follows a well-defined state machine:

The client creates or reuses an account
It submits a new order with requested identifiers (domains, IP addresses, device ID)
The server creates authorizations for each identifier
The client completes challenges (http-01, dns-01, tls-alpn-01, device-attest-01)
The client finalizes the order
The server issues the certificate

ACME uses signed JWS requests, nonces to prevent replay attacks, and asynchronous polling to handle validation delays. Clients are expected to tolerate retries, state transitions, and temporary failures.

While RFC 8555 defines the core protocol, real-world ACME servers must also support:

TLS-ALPN-01 (RFC 8737)
IP identifiers (RFC 8738)
Device attestation (RFC 9449)
Client quirks that deviate from the strict spec

This is where many private ACME implementations fail: the protocol is simple, but the ecosystem is not.

4. ACME Challenge Types in a Private Context

Challenges exist to prove that a requester controls an identifier.

HTTP-01

HTTP-01 works by serving a token over HTTP. It is simple but assumes public routing and inbound HTTP access—often incompatible with private infrastructure.

DNS-01

DNS-01 proves control by creating a TXT record. It works well internally if DNS is programmable, but introduces complexity around split-horizon DNS, propagation delays, and API credentials.

TLS-ALPN-01

TLS-ALPN-01 is particularly well-suited for private environments. The client presents a temporary certificate during a TLS handshake using a special ALPN protocol. No DNS changes or HTTP routing are required.

Device-Attest-01

Device attestation challenges bind certificates to hardware-backed identities such as TPMs or Secure Enclaves. This is critical for IoT, mobile devices, and Zero Trust device onboarding.

A well-designed private ACME server allows policy-driven challenge selection rather than a one-size-fits-all approach.

5. Building a Private CA with an ACME Server

Building a private CA is not simply a matter of standing up an ACME endpoint and generating certificates. A production-ready private PKI must be designed as security infrastructure, not just a certificate factory.

At a minimum, a private CA system consists of:

A Root CA, generated offline and kept completely isolated
One or more Issuing (Intermediate) CAs, responsible for online signing
An ACME server, which acts as the control plane
A policy engine, enforcing who can get what
Audit, logging, and monitoring components

The Root CA exists solely to establish trust. It should never be used directly for issuance and should be protected using offline storage, strict access controls, and infrequent activation. The goal is to make the root boring, stable, and nearly invisible.

The Issuing CA is where operational risk lives. Its private key must be accessible to the ACME server, but this access should be mediated through an HSM or cloud KMS wherever possible. Many mature systems separate certificate policy enforcement from cryptographic signing so that compromise of one does not immediately imply compromise of the other.

The ACME server is the most misunderstood component. It is not just a protocol adapter; it is the decision-making layer of your PKI. It determines:

Which identities are allowed to request certificates
Which SANs are permitted
Which EKUs apply
How long certificates live
Which challenges are acceptable

This is why different ACME servers feel so different in practice. BastionXP emphasizes flexible ACME semantics and mTLS-first workflows for Zero Trust Security. Step CA is an open source ACME server and supports basic ACME features in it free version. Boulder is specification-driven and complex. Teleport integrates certificates into a broader access control system but doesn’t support ACME. These are not just implementation details — they reflect different trust models.

Ultimately, building a private CA is about deciding where trust decisions live and how much automation you are willing to allow without human approval.

6. ACME Clients and Compatibility: The Hidden Complexity

One of ACME’s greatest strengths is also one of its biggest challenges: client diversity.

In the real world, ACME clients (Certbot, Lego, acme.sh, third-party ACME client libraries) behave very differently:

Some aggressively retry POST requests instead of polling GET endpoints
Some assume synchronous validation
Some refresh orders constantly
Some cache authorizations far longer than expected

These behaviors are often undocumented, inconsistent, and driven by assumptions inherited from Let’s Encrypt’s infrastructure.

A private ACME server that strictly follows the RFC but ignores these realities will fail in subtle and frustrating ways. This is why interoperability testing is critical. A robust ACME server must:

Treat POST-as-GET as first-class
Make all operations idempotent
Tolerate order refreshes at unexpected times
Gracefully handle duplicate finalization attempts

From an operator’s perspective, ACME failures are notoriously opaque. Without detailed logging of:

Nonce issuance and reuse
Order state transitions
Authorization and challenge evaluation
Policy rejection reasons
debugging becomes guesswork.

In practice, the ACME server must be designed defensively, assuming that clients will behave imperfectly and that automation will surface edge cases far more quickly than human workflows ever did.

7. mTLS as Identity, Not Just Transport Security

In modern infrastructure, TLS certificates are identity documents.

With mutual TLS (mTLS), both sides of a connection authenticate using certificates. Encryption is almost incidental; the real value is in strong, cryptographic identity.

This fundamentally changes how certificates should be issued:

A server certificate is not just “for a hostname”
A client certificate is not just “for authentication”
Each certificate represents a workload, service, or device identity

ACME fits naturally into this model because it supports short-lived, automatically renewed certificates, which align perfectly with ephemeral workloads.

However, ACME was originally designed for server certificates. Using it for mTLS requires careful policy design:

EKUs must be strictly enforced (clientAuth vs serverAuth)
SANs must encode identity consistently
Certificate lifetimes should be measured in hours or days, not months

Many systems embed identity directly into SANs:

DNS names for services
URIs or SPIFFE IDs for workloads
Hardware identifiers for devices

Once identity is established, authorization must be externalized. Certificates should answer who you are, not what you can do. RBAC, ABAC, or policy engines then consume certificate attributes to make access decisions.

8. Policy & Identity Enforcement: Where Private PKI Actually Succeeds or Fails

If there is one place where private PKI systems fail in practice, it is policy and identity enforcement. Cryptography is deterministic. Certificate issuance is mechanical. Policy, however, is socio-technical — it encodes organizational trust decisions into software.

The core question a private CA must answer is deceptively simple:

Who is allowed to obtain which identity, and under what conditions?

In ACME-based systems, this question is resolved primarily at the account layer. An ACME account represents a logical requester, but by itself it is meaningless. A raw account key proves only possession of a private key, not legitimacy. Without binding the account to a real-world identity, ACME devolves into an unauthenticated certificate vending machine.

Binding ACME Accounts to Real Identity

Modern private CAs therefore bind ACME accounts to external identity sources, such as:

Kubernetes ServiceAccounts
Cloud IAM roles (AWS IAM, GCP IAM, Azure AD)
OIDC identities (workload identity, CI jobs)
SSH principals
Hardware-backed identities (TPM, Secure Enclave)

This binding step is where Zero Trust principles enter PKI. Instead of trusting network location or static credentials, the CA validates who or what is making the request right now.

Policy as a First-Class System

Once identity is established, the CA must apply certificate issuance policies. These policies govern:

Which SAN types are allowed (DNS, IP, URI, SPIFFE)
Which SAN patterns are permitted (e.g., *.svc.cluster.local)
Which EKUs are allowed (serverAuth, clientAuth)
Maximum certificate lifetimes
Which challenge types may be used
Renewal behavior and frequency

Critically, policy must be fail-closed. If the CA cannot determine whether a request complies with policy, it must refuse issuance. Most real-world PKI incidents occur not because policies were absent, but because they were too permissive or silently bypassed.

9. Authorization After Authentication: Certificates Are Not Permissions

A subtle but critical distinction in PKI-based systems is the separation of authentication and authorization.

Certificates answer the question:

Who are you?

They should not answer:

What are you allowed to do?

This distinction is often violated in legacy systems, where access decisions are hard-coded into certificate contents. Modern systems avoid this mistake by treating certificates as identity assertions, not permission tokens.

Mapping Certificate Identity to Authorization Systems

Once a workload authenticates using mTLS, downstream systems map certificate attributes into authorization contexts:

SANs become service or workload identifiers
Certificate policies or OIDs become claims
Issuer chains define trust domains
Authorization engines — RBAC, ABAC, or policy-as-code systems — then decide access based on this context.

This design enables:

Fine-grained access control
Rapid policy updates without certificate reissuance
Reduced blast radius for mis-issuance

The private CA’s job ends at identity. Everything else should be delegated.

10. Automation & Infrastructure Integration: ACME as a Control Plane

The true power of ACME emerges when it is deeply integrated into infrastructure automation. In these environments, certificates are not requested by humans — they are requested by systems.

Kubernetes as the Canonical Example

In Kubernetes, workloads are ephemeral by design. Pods come and go, IPs change, and service identities must follow workloads dynamically.

ACME integrates into this ecosystem through tools like cert-manager, which:

Watches Kubernetes resources
Requests certificates via ACME
Automatically renews them
Injects them into pods and services

The private CA becomes an identity backend for the cluster, enforcing policy while remaining invisible to developers.

Service Meshes and Continuous Identity

Service meshes push this model further. They treat certificates as short-lived service identity tokens, rotating them frequently and enforcing mTLS by default.

In these systems:

Certificates may live for minutes or hours
Rotation is constant
Human involvement is zero
ACME’s asynchronous, idempotent model fits this perfectly.

CI/CD and Ephemeral Trust

CI pipelines benefit even more. Instead of long-lived secrets:

A build job authenticates via OIDC
Receives a short-lived certificate
Uses it to access internal systems
Expires automatically

This eliminates secret sprawl and dramatically reduces lateral movement risk.

11. Observability, Auditing, and Operational Reality

A private CA is a single point of trust. If it becomes unreliable, the entire infrastructure degrades.

Observability must therefore be designed in from day one.

What Must Be Observable

At a minimum, operators must be able to answer:

How many certificates are being issued?
Who is requesting them?
Which policies are being applied?
Why requests are failing?

Metrics, logs, and traces should cover:

ACME request lifecycles
Challenge validation outcomes
Policy evaluation decisions
Signing latency

Auditing as a Security Primitive

Audit logs are not compliance theater. They are the only way to reconstruct trust decisions after an incident.

A complete audit trail should capture:

Identity of the requester
Requested identifiers
Applied policy
Issuing CA
Timestamp and validity

Without this data, post-incident analysis becomes impossible.

12. Scaling and High Availability: Why PKI Breaks Under Load

Scaling a private CA is fundamentally different from scaling stateless application services. PKI systems are built around long-lived trust assumptions, stateful cryptographic material, and global consistency requirements. These characteristics make naive scaling strategies dangerous.

Scaling the ACME Control Plane

The ACME server itself is typically the easiest component to scale horizontally. ACME requests are largely CPU-light and can be handled by stateless API servers as long as all durable state — accounts, orders, authorizations, nonces — is stored in a shared, consistent backend.

However, ACME traffic patterns are deceptive. Certificate renewals tend to cluster:

Service mesh rollouts
Kubernetes node churn
Mass restarts after outages
Clock skew or misconfigured renewal windows

When thousands of clients attempt renewal simultaneously, ACME servers must withstand retry storms. Clients aggressively poll order status and retry failed requests, often in ways that amplify load rather than dampen it.

This makes idempotency and backpressure essential design requirements. Every ACME operation must be safe to retry, and the server must intentionally slow clients down when necessary — even if that means temporarily refusing issuance.

Scaling the CA Signing Layer

The true bottleneck in any PKI system is not HTTP — it is cryptographic signing.

Issuing certificates involves:

Accessing private CA keys
Performing cryptographic operations
Writing audit records

When keys are protected by HSMs or cloud KMS systems, signing throughput becomes limited and latency increases. These systems trade performance for security, and that trade-off cannot be ignored.

At scale, successful designs introduce:

Signing queues that smooth bursts
Explicit rate limits per account or identity
Separation between policy evaluation and signing
Multiple issuing CAs to shard load

Without these measures, PKI systems fail not catastrophically, but gradually and unpredictably, leading to timeouts, partial outages, and cascading renewal failures.

High Availability Without Trust Inconsistency

High availability in PKI is not just about uptime — it is about trust continuity.

Failover scenarios must preserve:

Issuer chains
Serial number uniqueness
Certificate validity semantics

Restoring from backups incorrectly can result in:

Duplicate serial numbers
Reissued certificates that should no longer exist
Inconsistent revocation state

These failures are especially dangerous because they often go unnoticed until clients begin rejecting certificates long after the root cause has occurred.

13. Failure Modes and Security Pitfalls: How PKI Systems Actually Fail

Most PKI incidents are not caused by attackers. They are caused by operators underestimating complexity.

Silent Mis-Issuance: The Most Dangerous Failure

The most severe PKI failures are not outages — they are silent mis-issuance events. A certificate is issued successfully, systems continue to operate normally, but trust boundaries have been violated.

This can happen when:

Policies are overly permissive
Identity bindings are incomplete
ACME accounts gain broader scope than intended

Unlike downtime, mis-issuance is invisible. Detection often occurs only after lateral movement or forensic analysis.

DNS and Network Assumptions That Don’t Hold

DNS-based challenges frequently fail in private environments due to:

Split-horizon DNS inconsistencies
Caching behavior in resolvers
Delayed propagation across zones

Similarly, TLS-ALPN-01 can conflict with existing TLS termination infrastructure, particularly when load balancers or proxies intercept handshakes unexpectedly.

These failures are rarely reproducible in test environments, making them particularly difficult to diagnose.

Trust Store Drift and Rotation Failures

Trust roots must be rotated eventually. When they are, systems often fail not because rotation is impossible, but because trust distribution is inconsistent.

Some workloads update trust bundles automatically. Others bake them into container images. Still others rely on OS-level trust stores.

If even one critical system lags behind, trust fractures — and debugging becomes a nightmare.

14. Adjacent Systems and the Convergence of Identity

Private ACME does not exist in isolation. It sits at the intersection of multiple identity systems that evolved independently.

ACME and SPIFFE: Complement, Not Competition

SPIFFE provides workload identities without exposing raw X.509 complexity to applications. However, many ecosystems — TLS, proxies, databases — still require standard certificates.

In practice, organizations often bridge SPIFFE identities into X.509 via ACME-based systems, using ACME as a translation layer rather than a replacement.

OAuth2, OIDC, and Certificate Minting

OAuth2 and OIDC excel at authentication but do not establish transport-level trust. ACME bridges this gap by converting ephemeral tokens into cryptographic identities.

This allows:

Short-lived certificates bound to token claims
Strong identity without static secrets
Unified identity across protocols

SSH and TLS: Parallel Evolutions

SSH certificate authorities solved many of these problems earlier: short-lived credentials, centralized policy, identity-based access.

The convergence of SSH CAs and ACME-based PKI points toward a future where all infrastructure access is mediated through a single identity control plane.

15. The Future of Private PKI: Continuous Trust, Not Static Certificates

The trajectory of private PKI is clear: certificates are becoming short-lived, automated, and contextual.

ACME is expanding beyond web servers into:

Device identity
Code signing
Email (S/MIME)

Secure workload communication

At the same time, identity verification is shifting toward attestation-first models, where trust is derived from hardware-backed evidence rather than configuration alone.

In this future:

Certificates may live minutes instead of days
Trust is continuously re-evaluated
Compromise windows shrink dramatically
Human-managed secrets disappear

ACME’s role evolves from “certificate protocol” to trust automation framework.

16. Conclusion: Private CA as a Trust System, Not a Tool

A private CA with ACME is not something you “set up and forget.” It is a living trust system embedded in your infrastructure.

When designed well, it:

Eliminates manual certificate workflows
Enforces identity and policy consistently
Scales with infrastructure growth
Reduces security risk rather than introducing it

When designed poorly, it becomes an invisible single point of failure whose mistakes propagate silently.

ACME provides the mechanics. The real engineering work lies in policy design, identity binding, operational discipline, and continuous verification.

Private PKI/CA, done right, is not legacy infrastructure — it is the backbone of Zero Trust systems.

17. BastionXP: Bringing Modern ACME-Based PKI to Zero-Trust Infrastructure

BastionXP is a private PKI/CA with a comprehensive ACME server support that is designed specifically to automate and operationalize the architecture discussed throughout this article.

By implementing a modern ACME-compatible Registration Authority(RA), backed by a hardened Private CA, BastionXP enables organizations to:

Issue short-lived X.509 certificates for services, workloads, and devices
Integrate identity signals (OIDC, Kubernetes, mTLS, device identity) directly into certificate policy
Enforce fine-grained issuance controls without exposing CA signing keys
Work seamlessly with existing ACME clients and automation tooling
Scale certificate management across DevOps, IoT, and zero-trust environments

Rather than treating PKI as a static trust artifact, BastionXP treats certificates as dynamic identity credentials — aligned with modern security models and automation-first infrastructure.

If you are:

Building internal mTLS-secured services
Automating certificate issuance for cloud-native workloads
Designing a zero-trust or device-identity architecture
Looking to replace manual or legacy enterprise PKI
Or need an ACME-compatible Private CA that works with real-world tooling

Explore how BastionXP can simplify and modernize your Private PKI stack — without compromising security, compatibility, or control.