The 6Ws - Where: Hosting and Operations | Enterprise Automation Services

While the earlier Ws decide what gets built, Where decides how and where it runs. This is the layer where reliability, cost, scale, security, and developer experience meet. A solid hosting strategy is not just a cloud provider and a cluster size. It is balancing simplicity, resilience, and affordability without compromising performance.

Where ensures the solution is not only functional but operationally viable.

What you define here#

Hosting platform and orchestration strategy
Deployment automation and CI/CD pipelines
Backup, rollback, and recovery strategies
Observability and operational tooling
Cost-aware scaling and environment isolation
Security and access controls

1. Platform foundations#

The default environment for most solutions we design:

Kubernetes via a managed service such as AKS
GitHub Actions for CI/CD
Docker containers for every service
Helm for versioned deployments
Terraform or Pulumi for cloud infrastructure provisioning

Three environments as a baseline:

Dev. Fast feedback, low cost.
Staging. Mirrors production, includes pre-release load.
Production. High availability, strict policies, HPA enabled.

2. CI/CD and deployment#

Pipeline structure on GitHub Actions:

On push to main or PR:

Lint, test, static analysis
Build Docker image
Security scan (e.g. Trivy)

On merge to main:

Push image to registry
Deploy to AKS via Helm
Notify on Slack or Teams

Repository shape:

Infra-as-code repo for Terraform modules
App repos for service code, Dockerfiles, and CI/CD YAML
Ops repo for runbooks, alerts, and docs

Rollback: Helm makes it a one-liner back to the last known-good chart. All changes are gated by versioned values.yaml.

3. Scalability and cost#

Smart scaling:

HPA based on CPU, memory, and custom Prometheus metrics
Node pools split so spot instances handle stateless work and dedicated nodes carry stateful services

Redis in front of MySQL:

Cache first for session data, token lookups, config blobs
Redis memory is cheaper than scaling MySQL read replicas

Static assets:

Cloud CDN for React/Next.js bundles
Edge caching for public images and documentation

4. Backup and disaster recovery#

Backups:

MySQL on a scheduled snapshot via the cloud provider
Redis snapshots exported daily
Persistent volumes snapshotted and versioned

Disaster recovery:

DR environments defined in Terraform
Automated restore jobs tested quarterly
Read-only recovery access for diagnostic teams

5. Network and access control#

Core principles:

Zero-trust inside the cluster
Namespace and RBAC in Kubernetes
Firewall and NAT egress restrictions
Service mesh (Istio or similar) for mTLS and traffic shaping
Cloud IAM integrated with GitHub OIDC for secure deploy permissions

6. Observability#

Our default stack:

Prometheus + Grafana for metrics
Loki or ELK for logs
OpenTelemetry for distributed traces

Strategy:

SLOs per critical endpoint
Synthetic tests via GitHub Actions or a third-party uptime tool
Custom alerts piped to Slack or OpsGenie

7. Choosing the right hosting shape#

Not every workload wants the same home. A pragmatic Where starts by matching the shape of the work to the shape of the platform.

Serverless functions are excellent for bursty, stateless, short-lived work. API endpoints that run briefly, webhooks, scheduled jobs, glue code between systems. They become expensive or awkward when requests are long-lived or when warm state matters.
Managed containers (Fluid Compute, Cloud Run, App Runner, Fargate) sit in the sweet spot for most web services. You get orchestration without running a cluster yourself. Cold starts are increasingly rare and pricing is sane.
Kubernetes is the right answer when you have enough services, enough traffic, or enough compliance constraints that you need full control. It is the wrong answer when "we heard it was standard" is the strongest argument for it.
Dedicated VMs still win for stateful workloads, specialist databases, and anything with heavy licensing. Boring, unglamorous, and often the cheapest correct answer.
The edge is genuinely useful for latency-sensitive read paths and personalisation. It is a poor fit for anything that needs a hot database connection pool.

The unwritten rule: the team that has to operate a platform should have a say in whether it is chosen. An architect who picks Kubernetes and then hands it to a two-person ops team is not making an architecture decision. They are making a hiring decision on someone else's behalf.

8. Environment isolation and cost control#

Environment	Key features	Cost controls
Dev	Auto-shutdown nightly	Ephemeral DBs + spot instances
Staging	Mirrors production	HPA + daily scale-in jobs
Production	HA setup	Autoscaling + reserved base capacity + overrun alerting

Spot instances cut cost on low-priority batch jobs. Redis saves on repetitive query costs. API request limits protect downstream spend.

9. Operational ergonomics#

A hosting choice is also a developer-experience choice. If deploying, debugging, or rolling back is painful, the pain compounds every sprint.

Signs the ergonomics are wrong:

A deploy takes more than fifteen minutes for a code change that touches one service.
Engineers ask "did that deploy actually go out?" and have to check three dashboards to find out.
Rollbacks require a manual runbook instead of a button or a git revert.
Local development does not resemble production closely enough to catch integration bugs before merge.
On-call engineers cannot reproduce a production issue because observability is missing or locked behind a paywall.

Investing in these is never glamorous and always pays off. The best engineering teams we work with treat deploy speed and rollback confidence as product features, not internal plumbing.

10. A real-world failure mode#

A warning from a past engagement. A team had a beautiful Where on paper: Kubernetes, multi-region, Istio, GitOps, everything. In reality they had one engineer who understood it all and three who were afraid to touch it. When the one engineer took paternity leave, a routine certificate rotation took down the cluster for six hours because no one else could follow the runbook.

The Where was not wrong in the abstract. It was wrong for the team that had to run it. When we came in, we did not recommend simpler infrastructure for its own sake. We recommended simpler infrastructure because the team that had to operate it told us, plainly, that they could not sustain the current setup. That feedback should have arrived before the first Helm chart was written.

A good Where optimises not just for the system, but for the humans who will keep it alive at 2am on a bank holiday.

11. Questions to close the Where#

Before declaring the hosting strategy done, the team should be able to answer each of these without reaching for a runbook.

How long does a full rebuild of production from source take, and when did we last test that?
What is the blast radius if our single cloud region has an outage?
What is our monthly spend by service, and which two line items dominate the bill?
What happens to in-flight requests when we deploy?
Where do our secrets live, and who can see them?
How do we know if a background job silently stopped running yesterday?
If the engineer who set this up left tomorrow, what would break first?

If any of those answers are "I'm not sure", the Where is not finished. These are not edge cases. They are Tuesday-afternoon realities. A hosting strategy that cannot answer them has not earned the word "strategy".

Summary#

Where ensures that the technical design in How can survive real-world pressure. With smart defaults, tested recovery plans, and continuous automation you can deploy confidently, scale responsively, recover gracefully, and control cost without compromising quality.

The best hosting strategy is the one your team can operate, evolve, and afford.

Methodology DevOps Infrastructure Cloud

Where: the hosting strategy