
How to manage resources with sbt and stay calm
08/06/2025
Meta
I'm writing this post mostly to capture my thoughts and a have a place to check how I've came to this implementation.
Pre
So life is simple. You are writing JSON <=> DB
adapters with your favourite language. You are not always happy with tooling.
One day you have a bright idea to bring some QoL changes into your and other devs' lifes. You've decide to automate database schema migration.
You are clever one, so you've pick one of the industry standart's framework. Doesn't matter which one. You write some glue code to work with DDLs from code, provision databases during test and migrate databases during app's startup phase (Not so bright decision, but whatever). You place all your precisious DDLs in src/main/resources
.
You run your CI, see all green check.
You are happy. It is spring. Weather is great.
Couple of weeks later you check which tests your CI runs on your recent PR with your new schema migration. You do not see any tests for DAOs and other parts of system which depends on this DDLs.
Cold sweat flows down your back.
🚨🚨🚨
You are running to check how many PRs already passed through CI and landed to prod. Not a single one, you are the first one. Nice, time to fix your mess. But first...
What is the problem, dude?
So why this happens?
We overuse sbt submodules in our monorepo. So with all this improvements we've ended in a situation where we have submodule with DDLs and submodule with business code with tests.
Oversimplified diagram which utilizes wrong primitives and relations:
Despite everybody hates sbt it is pretty clever piese of software with rich caching capabilities. Extremelly rich. Overly rich.
So when you make changes in schema/src/main/resources
this bastard do not trigger tests in implementation/src/test/scala
if they were successfully run at least once. It even ignores the fact that implementation
submodule depends on schema
submodule.
Now some people with PhD in CS may say that it obvious cause... some obsure shit. Do not care, real word is harsh place where you need to do the job done.
Solution
Obviously we need explain sbt that files changed, cachces now invalid, and all need to be run again.
Straitforward idea
Ok, so you can hook compile and test steps, check hashes of DDL files, invalidate cache and hope for the best. And this may work, but only if all your code, DDLs and tests placed in one single module. Keep digging.
Opus magnum
We use sbt-protoc plugin to render protobuf IDLs into code. I shamelessly steal my idea from this plugin.
General idea is pretty simple:
- Place DDLs not in resources (controversial, but required)
- Explain to sbt that this files should be moved to resources
- Generate code on the fly which links this DDLs and real scala code.
Code
I will not share whole code. Just point main caveats that you have during plugin development. Nope, it is still not full code!
Basic sane things that establishes barebone auto plugin.
object DDLCacheInvalidation extends AutoPlugin {
object autoImport {
object DDL {
val sourceFolder =
SettingKey[File]("ddl-source-folder", "Directories to look for DDL files")
val changed =
TaskKey[Seq[File]]("ddl-changed-schemas", "List all DDLs if has changes")
val helperObjects =
TaskKey[Seq[File]]("ddl-helper-objects", "Make hash files for each ddl and keep it as scala file")
}
}
// Default config key
val DdlConfig = config("ddl")
import autoImport.*
override def trigger = allRequirements
//Limit support to JVM only submodules
override def requires: Plugins = JvmPlugin
override def projectConfigurations: Seq[Configuration] = Seq(DdlConfig)
//Register plugin to each compile and test configuration.
override def projectSettings: Seq[Setting[_]] =
Seq(Compile, Test).flatMap(configuration => inConfig(configuration)(ddlConfigSettings(configuration)))
rivate[this] def ddlConfigSettings(conf: Configuration): Seq[Def.Setting[_]] =
Seq(
// Treat `module/src/main/ddl` as source folder. Makes IDE support lil bit nicer
DDL.sourceFolder := (conf / sourceDirectory).value / "ddl",
DDL.changed := listSchemasIfChanged(conf).value,
DDL.helperObjects := createDdlHelperObjects(conf).value,
//Register generators!
(conf / resourceGenerators) += DDL.changed.taskValue,
//This one is major one! Despite weird naming it calls the main chain of resource and synthetic scala code generation.
(conf / sourceGenerators) += DDL.helperObjects
.dependsOn(DDL.changed)
.taskValue,
(conf / unmanagedSourceDirectories) += DDL.sourceFolder.value,
)
}
All other guts spins around an idea that we copy resources from module/src/main/ddl
to managed resources output folder and register each resource file in built-in cache. I preffer to use HashFileInfo
cache instead of FileModified
.
Tracked.inputChanged[FilesInfo[HashFileInfo], Set[File]](
cacheFile / "input"
)
Hash function is not so CPU-hard, especially when you already has 150+ sbt submodules. Also, last modified date erases when sbt bundles files into .jars. They call it 'reproducible builds', my ass.
Also it is a good idea to have a easilly distinguishable file header! With such header you can programmatically delete only files generated by your plugin.
How it lands in a code
As I've mention plugin generates scala code. Here is an example of such file:
/** DO NOT EDIT THIS FILE
* Generated by DDLCacheInvalidation
* UUID: 263e085e-fc04-4771-922a-7fcb8be19291
*/
package modules.ddl.dbsConsole
package object mysql {
val console = new _root_.com.xxx.util.db.TestDatabase(name = """console""")
}
It is generated from submodule dbs/console
modules/dbs/console/src/main/ddl
└── mysql
└── console
└── 0001-console-full-bootstrap.sql
During the tests I use TestDatabase
instances to bootstrap database from correct resources. It makes the last part of solution - now your test really depends on generated files.