2025: when software became commodity

Are you building? What even IS quality?

Jan 01, 2026

Last year was a wild year when it comes to developing software and building systems.

We started with models that could reliably generate individual functions with enough hand-holding, and remove some of the drudgery of building software, and ended the year with Opus 4.5 reliably taking on complex tasks, allowing the programmer to focus on higher-level concerns like architecture and dataflow.

Skills made a huge splash because loading information lazily into the context window makes context management much easier.

Context Engineering should have gotten more attention, but I predict it’ll get more of that in 2026 as we’re moving past generating loads of code at high speed.

It’s a Great Time to Be a Builder

While some engineers still lament the fact that LLM-assisted coding – let’s call it agentic engineering – is taking the joy out of programming, and causes skill decay, others are busy building.

As with any tool, the responsibility for good outcomes lies with the user and good tools nudge the user toward good outcomes.

Since LLMs are so flexible, this needs experience and a large number of hours spent interacting with the models.

You need to develop a “feel” for what individual models are good at, where they are going off the rails and what they struggle with.

But once you do, you can focus on the outcomes you create.

You can eliminate the boring parts to get to the interesting parts that let you learn more and hone your skills.

And that in turn allows you to build interesting software that is actually used.

This is difficult if your identity is attached to the process of writing software, and not the process of building products.

Both approaches require placing attention in different places and operator attention is the limiting factor when the LLM can read and write code many times faster than you do.

Coming from the process of writing software side, you want to focus on system architecture, data structures and data modelling. The concepts you are dealing with are mainly technical and LLMs come in handy as sounding boards and typing aids.

Approaching problems from the product side couldn’t be more different: technological choices are relevant only in so far as they allow you to try, validate, and develop new product ideas to see what is worth keeping and what needs to be discarded. The faster you can iterate, the better, and features that have made it through this filter deserve to get more technical attention so that they don’t turn into attention sinks over time.

Now, the more technical expertise you have, the better you can execute on product iteration because you are positioned to make better technical choices, but critically you must still start from the product, not the technology.

A concrete example from Amp: I gave Amp a built-in tool to extract targeted information from media files which was technically very simple but resonated a lot with Amp’s users because it addressed a real pain point. Thinking about technology first when looking for things to build in the product would have made me miss this opportunity.

The New Quality Bar

With LLMs and coding agents ready to take care of the grunt work of launching a new project, the bar for new projects has risen.

Baseline expectations now include:

developer and agent-friendly documentation,
a logo,
a website,
integration with popular package managers

None of these things are hard but they used to require enough effort to not be worth it for a side project.

Now getting a basic version of any of these done is just a prompt away. Of course, it’s not perfect and needs manual touch ups after the fact, but it’s easy enough to actually ship it now that expectations have changed.

Which brings up the larger point of what even is quality?

The best definition I’ve seen so far was from my time at Toptal:

Quality means meeting expectations.

This simplistic definition skirts around the fact that usually there are multiple parties with different expectations involved, but generally the point holds.

Applied to the systems you are building: how important is what’s under the hood if it meets all expectations and can be discared/rebuilt with a relatively small investment?

Encoding Expectations

The big question to answer in 2026 is how to encode and specify expectations.

Only when expectations toward the software have been stated can we measure quality.

This goes beyond unit and integration tests to encompass metrics from the running system like memory usage, rate of exceptions, average response time.

Personally this nudges me to think that we’ll keep fast unit tests for iterating in the inner loop, and in addition to that get more blackbox tests that operate the system to assert certain qualities.

These tests should make zero assumptions about the implementation and use the application through well-defined interfaces and protocols, written under the assumption that the entire codebase will be burned down and rewritten eventually.

One way to achieve this is to treat testcases as data and have a custom test harness program evaluate them. I’m playing around with this approach in Feather where the same testsuite is used for multiple implementations of the system.

So far it has given me good results: discrepancies are easy to spot, the agent can fix them quickly and I can use known good implementation for creating a golden master of a test case.

The price is the cost of developing the test harness program, which in practice seems to be low, since programs of that type aren’t new or particularly complex.

Rebuilding Foundations

A lot of the software we use carries around properties of the environment in which it was born: React, Angular, and Vue were born at a time when vanilla JavaScript wasn’t particularly capable, browser support of useful features was spotty, and smartphones with embedded browsers were new, just introducing unstable low-bandwidth connections back into the equation.

How would a web framework built in 2025 look like? HTTP/2 brought a much higher simultaneous connection limit, smartphones are ubiquitous and mobile connectivity has improved by leaps and bounds. This allows for a different set of tradeoffs in the design and implementation of the framework, unencumbered by history.

But agentic engineering has lowered the cost of code generation so much, that exploring rebuilding infrastructure on new foundations is feasible:

what if you built an implementation of your favorite programming language that used WASM as its internal bytecode or sole compilation target?
what if you built something like Ruby on Rails, Next.js or any other fullstack framework with coding agents in mind, under the assumption that developer won’t be even looking at the code?
Supabase gives you Postgres through an API call, what if you gave every customer their own database?
Server-Sent-Events have become mainstream and well-supported, how would building a web-application realtime first look like?

I’m exploring this direction currently by re-implementing TCL in the form of Feather – TCL was created at a time when most applications were written in C and has a great story for embedding it in C.

But today’s applications are different: they provide a lot of the infrastructure that TCL does, like garbage collection and rich datastructures, and having to map back-and-forth in the host language is tedious.

How can we reimagine this so that the embedded language only provides the syntax and semantics, but data, memory and I/O are managed by the host?

And why even care in the first place?

An interactive Read-Eval-Print-Loop (REPL) allows for much faster feedback cycles both for humans and agents, but the options for embedding languages for that purpose are pretty slim.

The models are well-trained in Bash, and TCL has a lot of syntatic and semantic overlap that hopefully this carries over.

Like in this prototype, where the UI is defined through Feather/TCL and can be inspected and changed at runtime, just like in the browser:

What’s Next

My goal for 2026 is to write in general, and especially here. To make this work, I’ve decided to focus on reporting on my active side projects.

The biggest non-Amp projects in the works currently are:

Feather, a tiny programming language for adding interactivity to applications, fully built by coding agents.

And the Decode Estonian Tutor Bot where I explore architecture approaches that sound good in theory, to see whether they are a good fit in practice – like giving every user their own complete instance of the system.

Stay tuned, and have a great start into the New Year!

Discussion about this post

Ready for more?