Deploying a Phoenix app on Fly.io

Published on

I've been learning Elixir for a side project, and I'm interested in deploying a basic Phoenix app backed by Postgres on Fly.io. I ran into several problems along the way, so let's solve them together and make the path smoother for anyone else doing the same. Here's Fly's intro guide if you'd like to follow along at home.

Quick learnings if you're in a rush:

  • mix ecto.create requires a user "postgres" with password "postgres" to exist for local development (maybe just me)
  • mix phx.gen.release attempts to pick Docker images based on Elixir and Erlang version pairs
    • not all pairs of Elixir and Erlang versions have published images
    • version pair specifying does not conform to SemVer expectations
    • developers seem to run into this fairly often on the forums - (example 1, example 2)
  • Elixir doesn't have a way of specfiying required Erlang versions
    • using the most recent Erlang and Elixir versions may save you some trouble, but...
    • I learned this the hard way, so let's see what that looks like 🙂

Creating the Phoenix app

As a baseline, let's assume Elixir, Erlang, and Postgres are installed. Any Postgres from the past few years should do, since we only need the ability to create a database. I'm on 14.9, which is a little stale but perfectly functional.

Version management

I use asdf to manage installed Elixir and Erlang versions. Being able to support multiple versions of Elixir and Erlang can be quite useful - you might want to take the time to set up asdf if you're following along at home...!

You can use the following highlighted commands to see installed Elixir and Erlang versions (obligatory StackOverflow link):

$ elixir -v
Erlang/OTP 24 [erts-12.3.2.14] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1]

Elixir 1.16.0 (compiled with Erlang/OTP 24)
$ erl -eval '{ok, Version} = file:read_file(filename:join([code:root_dir(), "releases", erlang:system_info(otp_release), "OTP_VERSION"])), io:fwrite(Version), halt().' -noshell
24.3.4

Elixir 1.16.0 and Erlang 24.3.4 working

As referenced by the guide, start by installing Phoenix (the framework we'll use to scaffold the app) and Hex (the Erlang package manager):

mix archive.install hex phx_new

We should now be able to create a basic app with the following command - go ahead and fetch and install dependencies during setup.

$ mix phx.new groundhog

...<omitted>...

Fetch and install dependencies? [Yn] y

...<omitted>...

We are almost there! The following steps are missing:

    $ cd groundhog

Then configure your database in config/dev.exs and run:

    $ mix ecto.create

Start your Phoenix app with:

    $ mix phx.server

And now let's initialize the local database and run the app on our machine.

$ cd groundhog && mix ecto.create
Compiling 15 files (.ex)
Generated groundhog app
The database for Groundhog.Repo has been created
$ mix phx.server
[info] Running GroundhogWeb.Endpoint with cowboy 2.10.0 at 127.0.0.1:4000 (http)
[info] Access GroundhogWeb.Endpoint at http://localhost:4000
[debug] Downloading esbuild from https://registry.npmjs.org/@esbuild/darwin-arm64/0.17.11

Rebuilding...

Done in 152ms.
[watch] build finished, watching for changes...

Pointing a browser tab at localhost:4000 shows a working homepage:

phoenix base homescreen

A bit easier on the eyes than Twitter Bootstrap

If you're seeing the homepage, feel free to skip to the second step.

Possible trouble with ecto

I ran into trouble setting up the database:

$ mix ecto.create
==> groundhog
Generated groundhog app
** (Mix) The database for Groundhog.Repo couldn't be created: ERROR 42501 (insufficient_privilege) permission denied to create database

Ecto expects that there is a user "postgres" with a password "postgres" provisioned that can create databases. Without that user, ecto isn't able to create a database during the mix task.

We can see this by peeking at the development configuration file:

$ head config/dev.exs
import Config

# Configure your database
config :groundhog, Groundhog.Repo,
  username: "postgres",
  password: "postgres",
  hostname: "localhost",
  database: "groundhog_dev",
  stacktrace: true,
  show_sensitive_data_on_connection_error: true,

It might be that my installation of Postgres didn't come with the expected user, but it's a quick fix.

Fix: adding the expected user

You can take a look at users directly from a Postgres console:

$ psql
psql (14.9 (Homebrew))
Type "help" for help.

scott=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 scott     | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

\du asks Postgres to "list roles", but I find it easier to remember it as "describe users".

Add the expected user as follows:

scott=# CREATE ROLE postgres WITH PASSWORD 'postgres' CREATEDB LOGIN;
CREATE ROLE
scott=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 postgres  | Create DB                                                  | {}
 scott     | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

We should now be able to successfully build our local development database.

$ mix ecto.create
The database for Groundhog.Repo has been created

You can read more about Postgres terminal commands and database roles to see what other commands and options are available.

Onward to deployment!

Running fly launch

Now that we have an app that runs locally, our next step is to get it up and running on Fly.io. Let's return to the official guide.

Install flyctl and create an account as the guide suggests. You won't need a payment method to interact with the free tier of products, though adding one does make it easier to spin up more than one app while debugging.

With an account and the CLI installed, we can run the magic command:

fly launch

Note that I'm not going to attempt to attach Postgres (or tweak settings) at this point - our first goal is just to get a Dockerfile created and fly.toml configured that will allow us to build deployable images.

Here's what that looks like locally for me:

$ fly launch
Scanning source code
Resolving Hex dependencies...
Resolution completed in 0.119s
Unchanged:
  castore 1.0.5

...<omitted>...

All dependencies are up to date
Detected a Phoenix app
Creating app in /Users/scott/home/groundhog
We're about to launch your Phoenix app on Fly.io. Here's what you're getting:

Organization: Scott Gardner                   (fly launch defaults to the personal org)
Name:         groundhog-frosty-wildflower-753 (derived from your directory name)
Region:       Boston, Massachusetts (US)      (this is the fastest region for you)
App Machines: shared-cpu-1x, 1GB RAM          (most apps need about 1GB of RAM)
Postgres:     <none>                          (not requested)
Redis:        <none>                          (not requested)

? Do you want to tweak these settings before proceeding? No
Created app 'groundhog-frosty-wildflower-753' in organization 'personal'
Admin URL: https://fly.io/apps/groundhog-frosty-wildflower-753
Hostname: groundhog-frosty-wildflower-753.fly.dev
Set secrets on groundhog-frosty-wildflower-753: SECRET_KEY_BASE
Generating rel/env.sh.eex for distributed Elixir support
Preparing system for Elixir builds
Installing application dependencies
Running Docker release generator
Error: failed running /Users/scott/.asdf/shims/mix phx.gen.release --docker: exit status 1

The release mix task failed! If this command worked for you (read: generated a fly.toml and Dockerfile), you can skip to to the last step on attaching Postgres.

Debugging mix phx.gen.release

Let's rerun the failed command (which looks like it's being run locally) and see if we learn anything.

$ mix phx.gen.release --docker
* creating rel/overlays/bin/server
* creating rel/overlays/bin/server.bat
* creating rel/overlays/bin/migrate
* creating rel/overlays/bin/migrate.bat
* creating lib/groundhog/release.ex

13:18:11.122 [debug] Fetching latest image information from https://hub.docker.com/v2/namespaces/hexpm/repositories/elixir/tags?name=1.16.0-erlang-24.3.4-debian-bullseye-
** (RuntimeError) unable to fetch supported Docker image for Elixir 1.16.0 and Erlang 24.3.4
    (phoenix 1.7.10) lib/mix/tasks/phx.gen.release.ex:231: Mix.Tasks.Phx.Gen.Release.gen_docker/1
    (phoenix 1.7.10) lib/mix/tasks/phx.gen.release.ex:76: Mix.Tasks.Phx.Gen.Release.run/1
    (mix 1.16.0) lib/mix/task.ex:478: anonymous fn/3 in Mix.Task.run_task/5
    (mix 1.16.0) lib/mix/cli.ex:96: Mix.CLI.run_task/2
    /Users/scott/.asdf/installs/elixir/1.16.0/bin/mix:2: (file)

Aha! An actual error. Looking at the context where it's raised in lib/mix/tasks/phx.gen.release.ex:

:error ->
  raise """
    unable to fetch supported Docker image for Elixir #{wanted_elixir_vsn} and Erlang #{otp_vsn}.
    Please check https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=#{otp_vsn}\
    for a suitable Elixir version
  """

The mix task is trying to find us a suitable Docker image at this URL with the following query params:

...elixir/tags?name=1.16.0-erlang-24.3.4-debian-bullseye-

Visiting the URL the failed command shows us that there are no matching Docker images:

empty docker images list

Widening our search, if we remove the Erlang version from the tag parameter, we can see that for Elixir 1.16.0, there are many versions of supported Erlang - just none that exactly match 24.3.4,1. Let's try to use Erlang 24.3.4.14, as there are several valid-looking targets for the release image picker.

Fix: pinning Elixir and Erlang versions

If you didn't set up asdf or kiex + kerl to manage Elixir and Erlang installations, you should set them up now. Globally ripping out and reinstalling languages will eventually bite you (worse than it might be right now), and adjusting language versions is relatively painless once asdf is configured. Here's a guide walking through asdf setup if you need it.

The following commands handle installing the target versions of Elixir and Erlang, and set the versions we'd like to use locally:

asdf install erlang 24.3.4.14
asdf install elixir 1.16.0
asdf local erlang 24.3.4.14
asdf local elixir 1.16.0
asdf reshim

asdf local <lang> <version> creates a .tool-versions file, which is used to control what version of a language is run by the shell. asdf reshim just reloads the current path and to possibly adjust current language versions. Take a second to validate your installs by using the version printing commands.

With updated versions, let's circle back to deployment.

Running the local release command again:

$ mix phx.gen.release --docker
* creating rel/overlays/bin/server
* creating rel/overlays/bin/server.bat
* creating rel/overlays/bin/migrate
* creating rel/overlays/bin/migrate.bat
* creating lib/groundhog/release.ex

14:25:59.726 [debug] Fetching latest image information from https://hub.docker.com/v2/namespaces/hexpm/repositories/elixir/tags?name=1.16.0-erlang-24.3.4.14-debian-bullseye-
* creating Dockerfile
* creating .dockerignore

Your application is ready to be deployed in a release!

...

Sweet! Our Elixir and Erlang versions found a published image for our Dockerfile.

Let's run fly launch again with our working Dockerfile (still skipping the database configuration).

asciienma recording of running fly launch

Two things of note happen:

  • A fly.toml file is successfully created, which handles app deployment for Fly.io
  • It's a little hard to catch, but DATABASE_URL isn't set during deployment
    • We see (RuntimeError) environment variable DATABASE_URL is missing.
    • This means the app can't start, since it requires a connection to Postgres

Progress! The last step is wiring up Postgres.

Connecting Postgres

If we instead run fly launch and enter "y" to edit the base configuration options, a browser tab opens where we can attach services like Redis and Postgres.

Let's select the minimal Postgres instance available and name the database sensibly.

fly database selection

If we confirm the settings, the terminal reports that the configuration worked. Except if we check the deployed website, we see a blank page.

Looking at the logs, we see a crashlooping service with the following message repeating:

$ fly logs
proxy[...] [info]Starting machine
app[...] [info][    0.047000] PCI: Fatal: No config space access function found
app[...] [info] INFO Starting init (commit: bfa79be)...
app[...] [info] INFO Preparing to run: `/app/bin/server` as nobody
app[...] [info] INFO [fly api proxy] listening at /.fly/api
app[...] [info]2024/02/02 17:14:33 listening on [fdaa:2:d6eb:a7b:1ed:dc47:95c3:2]:22 (DNS: [fdaa::3]:53)
runner[...] [info]Machine started in 438ms
proxy[...] [info]machine started in 1.442739911s
app[...] [info]17:14:36.207 [notice] Application groundhog exited: Groundhog.Application.start(:normal, []) returned an error: shutdown: failed to start child: DNSCluster
app[...] [info]    ** (EXIT) an exception was raised:
app[...] [info]        ** (UndefinedFunctionError) function :net_kernel.get_state/0 is undefined or private
app[...] [info]            (kernel 8.3.2.4) :net_kernel.get_state()
app[...] [info]            (dns_cluster 0.1.2) lib/dns_cluster.ex:163: DNSCluster.warn_on_invalid_dist/0
app[...] [info]            (dns_cluster 0.1.2) lib/dns_cluster.ex:84: DNSCluster.init/1
app[...] [info]            (stdlib 3.17.2.4) gen_server.erl:423: :gen_server.init_it/2
app[...] [info]            (stdlib 3.17.2.4) gen_server.erl:390: :gen_server.init_it/6
app[...] [info]            (stdlib 3.17.2.4) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
app[...] [info] WARN Reaped child process with pid: 363 and signal: SIGUSR1, core dumped? false
app[...] [info] WARN Reaped child process with pid: 365 and signal: SIGUSR1, core dumped? false
app[...] [info]{"Kernel pid terminated",application_controller,"{application_start_failure,groundhog,{{shutdown,{failed_to_start_child,'Elixir.DNSCluster',{undef,[{net_kernel,get_state,[],[]},{'Elixir.DNSCluster',warn_on_invalid_dist,0,[{file,\"lib/dns_cluster.ex\"},{line,163}]},{'Elixir.DNSCluster',init,1,[{file,\"lib/dns_cluster.ex\"},{line,84}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,423}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,390}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,226}]}]}}},{'Elixir.Groundhog.Application',start,[normal,[]]}}}"}
app[...] [info]Kernel pid terminated (application_controller) ({application_start_failure,groundhog,{{shutdown,{failed_to_start_child,'Elixir.DNSCluster',{undef,[{net_kernel,get_state,[],[]},{'Elixir.DNSCluster',warn_on_invalid_dist,0,[{file,"lib/dns_cluster.ex"},{line,163}]},{'Elixir.DNSCluster',init,1,[{file,"lib/dns_cluster.ex"},{line,84}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,423}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,390}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}}},{'Elixir.Groundhog.Application',start,[normal,[]]}}})
app[...] [info]Crash dump is being written to: erl_crash.dump...done
app[...] [info] INFO Main child exited normally with code: 1
app[...] [info] INFO Starting clean up.

Cleaned up for legibility

What's going on?

Debugging the crashloop

At this point, I posted on the Fly.io forums trying to understand the crashloop. Having made no functional changes to the app, I was growing increasingly confident that there was a problem outside my control.

Another developer and I go on a bit of a tangent about how the PORT environment variable is possibly used (or misused) during a deployment. But eventually I realize that I can avoid the crashloop by turning off the DNSCluster, and the service successfully deploys.

In the following code (which handles starting the required processes to run the web app), having one of the child processes fail continually makes the service unrunnable:

# lib/groundhog/application.ex
defmodule Groundhog.Application do

  @impl true
  def start(_type, _args) do
    children = [
      GroundhogWeb.Telemetry,
      Groundhog.Repo,
      {DNSCluster, query: Application.get_env(:groundhog, :dns_cluster_query) || :ignore},
      {Phoenix.PubSub, name: Groundhog.PubSub},
      # Start a worker by calling: Groundhog.Worker.start_link(arg)
      # {Groundhog.Worker, arg},
      # Start to serve requests, typically the last entry
      GroundhogWeb.Endpoint
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Groundhog.Supervisor]
    Supervisor.start_link(children, opts)
  end

  # ...

Supervisor.start_link/2, from the 1.16.0 Elixir docs:

If the start function of any of the child processes fails or returns an error tuple or an erroneous value, the supervisor first terminates with reason :shutdown all the child processes that have already been started, and then terminates itself and returns {:error, {:shutdown, reason}}.

So our question becomes, why does starting DNSCluster consistently fail?

At which point, Chris McCord (the creator of Phoenix) joins the thread and recommends bumping Erlang/OTP major versions to get access to the missing function. He also patches DNSCluster to support Erlang/OTP 24 (link).

This is undenaibly cool - sure, I just spent an embarassing amount of time trying to brute force my way through a walkthrough containing roughly six total commands. But it feels neat that the creator of "the Rails of Elixir" still pokes around on the forums and makes sure people are having a good experience using the framework.

We're in the home stretch - let's quickly finish the walkthrough.

Fix: bumping Erlang

Bumping Erlang to the next major version does indeed resolve the issue. Installing the updated version of Erlang and reshimming:

asdf install erlang 25.3.2.8
asdf local erlang 25.3.2.8
asdf reshim

Keep in mind that these commands rely on your current working directory and path. You can put .tool-versions files directly in repos, but having one in your "home" or "code" folder is also a good idea since the Phoenix CLI relies on Elixir and Erlang versions too. You can also set global versions with asdf global <lang> <version>, which are overridden by "closer" .tool-versions files.

At this point, following the getting started guide works on my machine and deploys every time.

mix archive.install hex phx_new
mix phx.new groundhog
cd groundhog && mix ecto.create
fly launch
// ...making sure to attach Postgres
// 🚀🚀🚀

Takeaways

Debugging with lilac-tinted glasses

Being slightly more Elixir-attuned after this saga, here's how I'd look at the crashloop now:

  • :net_kernel.get_state looks like Elixir making use of Erlang libraries
    • the syntax matches the expected form: :module.function()
  • function :net_kernel.get_state/0 is undefined or private is likely not because an object wasn't initialized2 - rather, it's probably from a nonexistant Erlang method being called
  • APIs grow more than they shrink - it's more likely that our Erlang version is stale than it is that a method was removed
  • Try searching for the method in question and see if it was added (or removed) from major Erlang versions

Sure enough, net_kernel:get_state/0 is added in OTP 25.

PaaS is a thriving ecosystem

Platforms as a Service feel like a crowded space.

There are many companies trying to build infrastructure solutions for deploying code (Fly.io, Heroku.com, render.com, railway.app). The site I'm making doesn't require any rare infrastructure, so I based my choice between providers on sane-looking docs (like the walkthrough) and Fly.io blog posts. There's a sizable difference between Fly's walkthrough and those of the competitors (specifically for Elixir - render walkthrough, railway template).

At the same time, the open source community feels small.

It certainly helps to pick the infrastructure provider where open source contributors work, but it's still cool that the creator of Phoenix is watching forums and trying to make sure developers get up and running successfully.

Erlang and Elixir versions are decoupled! 🤯

Erlang versioning isn't part of mix.exs! Even though Elixir is built on Erlang, there's no way to specify Erlang dependencies from Elixir. This has been brought up before on the Elixir language forum, but there isn't much discussion. I'm still not sure what to make of that - I'd expect it's a combination of a few things:

  • The Erlang libraries are relatively stable, and deprecations are rare
  • Major version releases are steady, continual, and infrequent
    • Erlang major versions seem to get published every May
  • Bugs caused by regressions are likely uncommon

For now though, seems like a good thing to keep in mind.

Good luck out there!