physical servers running RKE2
Find a file
renovate-bot 34f0b8aaa4
Some checks failed
ci/woodpecker/push/format Pipeline was successful
ci/woodpecker/push/eval Pipeline was successful
ci/woodpecker/push/build Pipeline failed
chore(deps): update nginxinc/nginx-unprivileged:1.29 docker digest to 0ac6770
2026-02-23 15:03:17 +00:00
.woodpecker .woodpecker: fix nix image version 2026-02-19 02:48:21 +01:00
cluster chore(deps): update nginxinc/nginx-unprivileged:1.29 docker digest to 0ac6770 2026-02-23 15:03:17 +00:00
config chore(deps): update helm release renovate to v46 2026-02-19 03:10:02 +00:00
core@180dc2208c core: update 2026-02-16 16:53:25 +01:00
hosts/primary.k8s.maevi.net hosts/primary.k8s.maevi.net: setup pr-list 2026-02-16 20:58:43 +01:00
scripts treewide: fix renovate 2026-02-15 03:18:51 +01:00
services hosts/primary.k8s.maevi.net: setup local NAT64 2026-02-16 03:59:23 +01:00
sops@43258ceff8 sops: update 2026-02-17 21:40:49 +01:00
.gitignore core: update 2026-01-15 06:07:01 +01:00
.gitmodules treewide: azey.net -> maevi.net 2026-01-12 23:47:10 +01:00
.renovaterc .renovaterc: add misc stuff to auto updates, enable PRs for woodpecked containers 2026-02-18 01:37:35 +01:00
flake.lock core: update 2026-02-16 16:53:25 +01:00
flake.nix core: update 2026-01-16 19:14:14 +01:00
LICENSE init 2025-06-24 15:37:01 +02:00
preset.nix treewide: overhaul networking, attempt cilium multi-pool 2025-11-09 02:41:38 +01:00
README.md hosts/primary.k8s.maevi.net: setup local NAT64 2026-02-16 03:59:23 +01:00

Welcome! This is the NixOS flake defining most of the infrastructure hosting https://maevi.net & subdomains :3

Everything is hosted on an RKE2 cluster and fully defined in Nix; all images and helm charts are fetched using docker.pullImage, and the whole cluster is running as if it was airgapped1 (except for services which explicitly need internet access, like searxng). Everything is also automatically updated through renovate (including nix hashes! just don't look at the .renovaterc script, it's a mess...)

In addition to just being for selfhosting services, my end goal for this is to be pretty much a fully offline-ready infrastructure that doesn't rely on the internet for anything but updating itself (& interop, obviously I still need stuff like CA-approved certs for the public TLS proxy2). Thanks to the power of Nix everything is built & cached via CI into a local cache, which means that theoretically it should be possible to install new systems, create configs & generally do everything except for downloading brand new software/updates fully offline.

See the core flake for the general structure, this is the non-standard stuff:

  • cluster/: az.cluster defs, options to be set cluster-wide
  • hosts/: instead of each dir being a host, each dir is a cluster with a nodes/ subdir
  • sops/: a private submodule with all the secrets, passwords, etc, decryptable with a machine-local age.key (also stored in bitwarden for backup reasons)
    • not mirrored to codeberg, but most of these are just randomly-generated secrets anyways

The core infra is IPv6-only routed through a wireguard tunnel (courtesy of as200242!), but I also have a VPS for the domain's secondary nameserver, uptime page and a mirror of the root site on legacy IP3 for the people with ISPs still stuck in the 90s.

Network layout:

  • 2a14:6f44:5608::/48 - public prefix
    • :0000::/52 - k8s clusters
      • :000::/56 - primary.k8s.maevi.net
        • :00::/64 - reserved static IPs & ExternalIPs for gateways
          • ::1 - envoy gateway
          • ::53 - knot.app-nameserver.svc, aka ns1.maevi.net
          • ::1:53 - ^^^'s pod, since calico's ipAddrs can't be set for only one IP family
          • ::64 - internal NAT64 tayga address
          • ::ffff - k8s apiserver VIP
          • :25::/72 - mail servers
            • :25::ffff - dovecot, imap.maevi.net
            • :25::fffe - rspamd, firewalled & internal static IP so postfix doesn't have to look it up
            • the rest of :25::1 and up postfix pods (mx*.maevi.net)
        • :01::/64 - pods CIDR
        • :02::/64 - services CIDR (though only the first /112 of that is actually used because RKE2)
        • :f0::/60 - node addrs, really a /64 but reserved as /60 for possible future routing shenanigans
          • :f0::1 - astra
    • :1000::/52 - misc personal devices - desktops/laptops/etc
  • fd33:7b36:fc28::/48 - ULA prefix routed through mullvad
    • uses same addressing scheme as public prefix, though only the /64 pod CIDRs are actually used
  • 2a01:4f9:c012:dc23::1/64 - ns2.maevi.net, also hosts the legacy mirror + status page & proxies IPv4 mail

Guides for future me:

Setting up a new server:

  1. add an entry to the relevant hosts/ dir
  2. create dir in hosts/*/nodes/, note that domain and hostName are defined automatically
  3. generate a new age.key, re-encrypt all the stuff in sops/
  4. done!

Setting up a domain:

  1. configure stuff in the relevant hosts/ dir
  2. if selfhosting the nameservers:
    • if <2 nodes with separate public IPs, set up VPSes (see nixos-vps) & add entries to clusters.nix
      • exec into the pod & run knotc status cert-key for the primary's pubkey
    • set up glue & DS records with the registrar, exec into the pod and run keymgr <zone> ds for the DS stuff
  3. done!

Setting up cluster from scratch:

  1. enable nothing but the az.server.rke2 modules & az.cluster.core.nameserver
  2. wait for nameserver to come online, follow Setting up a domain step 2
  3. enable core.envoyGateway & core.auth with authelia.domains = [], login to lldap using the init-passwd, then:
    • create lldap-admin account with lldap_admin group
    • delete default admin account
    • create authelia account with lldap_password_manager group
    • create admin group, user account(s)
  4. enable core.mail & set core.auth.authelia.domains, setup 2FA
  5. enable everything else as needed, manual steps for specific services:
    • forgejo: temporarily modify gitea.admin & enable internal auth in the chart's valuesContent, delete account when done with setup
      • OIDC: additional scopes email profile groups, auto-discovery URL https://auth.maevi.net/.well-known/openid-configuration
    • navidrome: no default auth, IMMEDIATELY connect & create admin user
    • woodpecker: create Integrations > Applications in forgejo (https://woodpecker.maevi.net/authorize), modify sops secrets
      • create woodpecker-ci user in forgejo & add as collaborator to repos
      • create secrets in woodpecker:
        • REMOTE_URL: https://woodpecker-ci:<passwd>@git.maevi.net/<repo>, available for appleboy/drone-git-push per repo
        • NIKS3_AUTH_TOKEN: self-explanatory, available organization-wide
      • create daily cron in each repo
    • grafana: delete default admin user
    • renovate bot: create user (restricted account), add as collaborator
      • login & create personal token in Applications, put into sops - just the password might also work
    • niks3: add NIKS3_AUTH_TOKEN var to woodpecker
    • garage:
      # garage setup
      alias garage 'kubectl exec -it -n app-garage garage-0 -- /garage -c /config/config.toml'
      garage layout assign -z primary -c 2T <node_id>
      garage layout apply --version 1
      
      # nix cache setup
      garage bucket create cache
      garage key create cache-rwo # add to niks3 sops
      garage bucket allow --read --write --owner cache --key cache-rwo
      garage bucket website --allow cache
      nix-store --generate-binary-cache-key cache.s3.maevi.net cache.key cache.pub
      # add cache.key to niks3 sops & cache.pub to core flake + scripts/etc
      
    • jellyfin: enable legacyIP for the media namespace, run the initial setup & enable IPv6
    • files: first access sets admin OIDC username, all users also have to be created manually - note that / means actual root, use /srv

  1. See config/rke2/default.nix, but the TLDR is that it's possible to use the RKE2 embedded registry to completely disable pulling any images and rely on those preinstalled on the node(s). Runtime airgappiness is handled with network policies. ↩︎

  2. In this example specifically I mean only for the public proxy, at time of writing this isn't implemented yet but eventually I'd like to run a local step-ca instance for local network connections ↩︎

  3. version four of the Internet Protocol as defined in RFC 791 (three digit RFC! and we're still using it in $YEAR). ↩︎