Systematic pAIn

The Real pain of AI

Apr 17, 2026

Today I had a conversation with myself. Well, actually I had a lovely old chat with my back-of-house AI project about why my front-of-house AI projects have stopped working properly, certainly today with the whole Sonnet 4 thing but thats another topic altogether. Anyhow.

Let me set the scene.

I have multiple projects, locked content, custom skills uploaded. Every project has Project Instructions tailored per project, with PK, memory, handovers etc.

Plus a ClickUp workspace with a canonical hierarchy document and a routing decision tree.

By any measure, my AI setup is mature. I have spent considerable time building it.

It is documented, version-controlled, and governed by rules I have codified into the system itself.

And today, Claude ignored almost all of it.

What actually happened

I have been trying to get a particular piece over the line for three weeks. The content is locked, the template exists, with built-in instructions, skills are loaded, rules are clear: this is a formatting job, not a writing job.

Instructions: Place locked content into template slots. Do not invent. Do not change. Do not add.

Claude renamed files instead of version-controlling them. Twice. Despite having a Skill that contains the exact naming convention. It invented content that has never appeared in any conversation. It overwrote locked copy with generic alternatives. It drifted into content discussion when told, repeatedly, to focus on structure.

Then the thread froze. The model switched to something I had never seen before, a legacy Sonnet 4, mid-conversation, with no warning. I copied the entire frozen thread into a new one, with the handover and thread directory. Claude could not find the locked content. It was IN the pasted text. Present in the context window, and yet Claude could not retrieve it.

What I diagnosed

I went to my Architecture and Agents project, the back-of-house stream that handles infrastructure questions, and asked it to diagnose why the other projects were failing.

Here is what we found.

My setup is architecturally correct. Anthropic’s own documentation confirms that each project has its own isolated memory space. Skills are global, they fire across all projects. The project structure, tailored instructions, scoped knowledge bases, all of it is set up the way the manual says to set it up.

And here is the thing that proves it is not my setup: EP02, my most complex project, with SOPs, decision matrices, and heavy governance documentation, works brilliantly. The only issue is thread length, which Cowork solves. Accuracy is never the problem. EP02 is a single project with everything in one place, carrying significantly more cognitive load than any of the contAIn projects.

So the problem is not the architecture, or the volume, nor the context window.

The problem is that Claude reads its instructions and then does not follow them.

The gap not spoken about

Most AI courses and tutorials “here’s how to set up your AI workspace” tell you the same thing. Build skills, use projects, write good instructions, set up memory, and if you do all of that, the promise is: Claude will remember you, maintain your standards, and produce consistent work. Hell it might even make you a gazillionaire by Monday….

Well I am not a gazillionaire, but I did do all of that. Meticulously. With version control and governance that most consultancies would never match, and organisations would envy.

Yet, Claude still drifts, invents and overwrites locked content with whatever it decides is better. Still breaks its own rules while those rules are literally loaded into its system prompt.

So as you do, I checked with the other lads.

Grok You’re Not Crazy — This Is a Real, Widespread Problem in 2026

Chat GPT - Short answer: yes, it absolutely happens. What you’re describing—AI ignoring instructions, drifting, inventing, losing track of provided content—is a real and well-known behavior pattern.

Perplexity - Yes—this can happen. AI systems can ignore instructions, drift off-topic, or produce invented details, especially when prompts are long, conflicting, or the conversation gets complex.

This is not a setup problem. This is an execution fidelity problem. It is not something you can fix with better prompts, more documents, or a different project structure.

What I am not saying

I am not saying AI is useless. I built my entire methodology using Claude. EP02 proves it works, spectacularly, under heavy load. The hundreds of hours of real project work that form contAIn’s evidence base were all done with AI, but directed and checked by me.

I am not saying Skills and Projects are pointless. They genuinely help. My Skills have caught errors that would have taken me hours to fix manually, and project isolation keeps my work clean.

What I am saying is that the gap between “AI can do this” and “AI reliably does this every time” is wider than anyone selling AI courses wants to admit. And the people who need to hear that most are the ones who have done everything right, followed every tutorial, built every skill, and are still watching their AI produce garbage on a Friday afternoon for no discernible reason.

The honest question

So here is where I am. My system is correct. My setup is mature, my methodology is proven on complex, real-world projects, and my AI still cannot reliably format a workbook without inventing content, breaking naming conventions, and ignoring instructions it has literally just read.

If you are experiencing the same thing, I would genuinely love to hear from you. Because either I am missing something fundamental, or this is a known limitation that few in the AI space are being honest about.

If it is the latter, then the human governance layer, the thing that catches AI when it drifts, is not a nice-to-have. It is the entire point.

Thoughts please??

Sam

configure YOUR system. contAIn the chaos. control YOUR outcome.

Discussion about this post

Ready for more?