GraalVM Native Images Have a Major Flaw, But It Can Be Mitigated

I've started a discussion on StackOverflow without much success, so feel free to contribute there, or under this post.

The Problem with Reflection

Although we think of Java and most other JVM languages as statically-typed, the JVM muddies that assumption with one of its most powerful features - run-time reflection. This often (ab)used feature allows us to load any class we want at run-time, access any of its members, and even change its code. In the name of performance, GraalVM native images operate under a closed-world assumption, meaning that anything that isn't considered reachable by its slimmed-down version of reflection at run-time is discarded at build-time.

To be more exact, either the compiler figures out what is going to be needed for reflection trough API calls with constant parameters, or you have to manually give it that information trough metadata files. Anything else will throw a run-time exception when called.

This essentially means that the compiler leaves holes in its supposedly closed world, and any non-trivial program is bound to fall trough them at some point. What mostly I mean by non-trivial is anything that uses external libraries. Unless you, the programmer, can statically verify that each library you're using won't access a member outside of the native image's closed world, you can't know when your program will crash. I'm saying when, because given enough time, every unbounded variable will eventually reach a critical value.

Currently Proposed Solutions and Why They Don't Work

The community and the GraalVM team are (loosely) aware of these problems and offer the following solutions:

Run your application with a tracing agent that collects statistics on API calls and their values: This involves basically just observing how your application behaves at run-time and using that information at compile-time. It's been almost a decade since I've done any formal program verification, but I'm pretty sure that when we have unbounded inputs and the closed-world assumption is broken, there is no guarantee of program correctness. This would be like reaching 100% test coverage and claiming it has no bugs.
Other than that, we are often unable to collect these metrics in the program's target environment. Since serverless is often the target environment for native images, this is an immediate and obvious hurdle. Some frameworks like Micronaut and Quarkus apparently help with this, but I'll get to their shortcomings in a bit.
Libraries providing their own reachability metadata: The idea is that every library that you use provides information on what it might need for reflection. I'd argue that this is pointless since the biggest use case for reflection is gathering unknown information at run-time. Take the very popular JSON serialization library Jackson for example. It can't ship reachability information because it will use reflection on classes that I wrote. There's no way for their developers to statically declare every class that their code will be used on.
Besides that, it seems like this approach is fairly new, and even arguably GraalVM's biggest target, AWS Lambda, doesn't care about maintaining their own reachability metadata.
In any case, folk wisdom teaches us that if your solution relies on the assumption that "everyone just...", it's already very likely to fail.
Frameworks like Micronaut or Quarkus: I have to admit that I don't know much abut these to judge them. It seems like at least Micronaut provides its own versions of many libraries you would commonly need, like for (de)serialization, and those are supposed to play nicely with native images. My problems with these frameworks is that as far as I know, they don't constrain you from using any library you want. You personally might know that something is safe to use, but can you say that for everyone working on that project during its lifecycle? What about future versions of every single library?

My Proposed Mitigation

In my opinion, the only way we can statically ensure correctness (in the sense of valid reflection calls) is to completely prohibit dynamic reflection in reachable code. The GraalVM native compiler already does an analysis of reachable code and reflection calls. Ideally, we could set a flag that tells it to fail on any reflection calls with unbounded variables. This will inevitably make many libraries unusable, but I'd argue that this type of formal correctness is very important for use cases that native images are targeting. In other words, I'd rather have the option to know if a library I'm using might cause a crash.

Right now, I'm not aware of this option in the current version (v25) of GraalVM. We can do something similar with tools that prohibit calls to specific methods, but they might have a different view of the world from the compiler.

I will probably open a ticket with GraalVM and see where it goes.