How I Use LLMs These Days And Somewhat Enjoy It

I started using Claude Code in January after a month and a half of hearing people on the internet rave about Opus 4.5 being a “game changer”. Previously, I had ignored the hype around AI ever since ChatGPT came out. Before January I hadn’t used any LLMs as I found Google search to be sufficient for whatever I was doing.

The other reasons included the negativity around AI. Including the market forces (read rich greedy individuals) putting people out of jobs and framing it as a good thing. Skepticism around why large organizations are implementing mandates and leaderboards for AI usage. The thought we are being duped to get addicted to using these tools while they are heavily subsidized only for prices to be jacked up later. Which is proving to be true based on the GitHub news from this week and has historic precedent with how Netflix and Uber have moved from market consolidation and user acquisition to enshittification so “the line always goes up”.

What made me finally try it for myself was the thought that a lot of smart people, maybe a 100 times smarter than I’ll ever be, are working on this so it will only improve. And maybe LLMs won’t be as revolutionary as Sam Altman, Dario Amodei, Jensen Huang or any one else that has a financial stake in them would have you believe, but they still bring utility to some people just like any other tool. As a person in the tech industry the least I could do is try before I deny.

One of the first things I used it for was help in making the homepage of this website, specifically with the animation logic and HTML Canvas that I think would have taken me a month or more to do given I had no experience with drawing graphics in HTML Canvas and my rusty trignometry and math in general would never have allowed me to come up with the rotation logic so quickly.

What didn’t work as well as I’d hoped

TL;DR: I over-promised and under-delivered.

I had read Anthropic themselves suggest treating Claude as a “slot machine” in some contexts (see page 11).

When faced with merge conflicts or semi-complicated file refactoring that’s too complex for editor macros but not large enough for major development effort, they use Claude Code like a “slot machine” - commit their state, let Claude work autonomously for 30 minutes, and either accept the solution or restart fresh if it doesn’t work.

While others suggested treating it as an over eager very knowledgeable junior engineer.

I had been reading anecdotes online which made me want to finally experiment with the tool. At the same time my team’s product manager also did his own experiments. My philosophical feelings aside, I was for trying out LLMs if it meant that I could deliver more ambitious goals in a shorter time. On the other hand, I was cognizant of the fact that I would be giving up the satisfaction of figuring things out myself, but thought I would get satisfaction from completing the mission on time. A discussion involving executives, product managers and engineers led to the decision to make use of AI assisted coding for the first time in our company on a greenfield project being kicked off in January.

In the early days while just ideating and creating UI prototypes things seemed to be going quite well. We were getting results faster than if we had been writing all the code ourselves, and we were able to get feedback and iterate quicker. When we started to integrate our backend and had real data flowing through the system is when the cracks started to appear and things started going off the rails. In the end I could not keep my promise on the timeline I agreed to.

I felt that I wasn’t as close to my code as I was on previous projects. It’s as if my previous codebase was a modest house where I had placed every piece of furniture myself, some inherited, some bought years ago, some thrifted. And I knew where everything was because I put it there myself. I knew which plates and bowls are in which kitchen cupboard. Which clothes are hanging in which closet. What tools I have in the shed.

This new project was like if I was a rich man living in a 20 room mansion who had an interior decorator do everything based on a few meetings and didn’t know where anything was because the servants take care of all that. Then one day I am alone and I can’t find anything.

I also felt that I was losing my edge and was (and still am) afraid of skill atrophy.

I wouldn’t say that people shouldn’t use AI for feature development at all, but I would say don’t assume that it would save a significant amount of time. Though I concede that it might in some cases. Because of its “slot machine” nature, if the requirements of what you are trying to do match what other people have done in the past you are more likely to hit a “jackpot”.

Reading the research and realizations

I came across the article How AI assistance impacts the formation of coding skills a month into my journey with AI and found a lot of truth to it. Though I have yet to read the full research paper, the article confirms that using LLMs in certain ways “led to a statistically significant decrease in mastery”.

junior developers or other professionals may rely on AI to complete tasks as fast as possible at the cost of skill development—and notably the ability to debug issues when something goes wrong.

Other research shows that when people use AI assistance, they become less engaged with their work

This really resonated with me based on my own experience. I realized I had become a bit scatterbrained in the past month.

Cognitive effort—and even getting painfully stuck—is likely important for fostering mastery.

I think the above is also very important. The difference of putting in the effort into things versus getting them handed to you on a silver platter makes a stark difference. I had the vision for my front page for two years. I could argue that I had researched what components and what libraries I would use and that I have a high level understanding of how everything is laid out so I could have made it like I envisioned on my own, albeit it would have taken a lot longer. But I have no way of knowing that since it hasn’t been tested. I didn’t share the page on Reddit like I thought I would when I was done with it, as I don’t feel proud of it because I didn’t tackle the hardest part myself. I still have a learning debt that I have to pay off.

The Anthropic study was a qualitative analysis between AI interaction models and how they relate to skill development, by having participants do tasks with AI and quizzing them on concepts later.

The low scoring patterns were:

AI delegation: relying on the AI fully.
Progressive AI reliance: asking a question that ends up with you handing over code writing to AI.
Iterative AI debugging: starting off work manually and then debugging with AI when you hit a roadblock. Rather than clarify understanding, use AI to power through a solution.

And high scoring patterns were:

Generation-then-comprehension: generating code and manually copy pasting into an editor. Then asking AI why it works.
Hybrid code-explanation: asked for code and explanations read through the code and explanation
Conceptual inquiry: only used for research no code generation

In January I had been using AI for work using low scoring patterns “AI delegation” and “Progressive AI reliance”. Because of the early wins I had become somewhat addicted to doing things that way even though I knew in my gut it wasn’t good for me long term.

Traditionally when working on a problem I am typing away and only focused on that problem. But with AI assisted coding after writing a prompt it takes 1-5 minutes for the assistant to generate the code, during this time my mind goes on tangents by either starting to think/stress about a future problem or starting to read work chats.

Finally, I found what works for me

Documentation

I am a very slow writer because I think more than I type. I pause a lot and think a lot. I get up and pace around. It took me 3 hours to type up to this point but I have already fully “written” this article in my mind and am going over and over it making adjustments. I can do this for this article because it is in my free time and I consider writing a craft so I don’t want the AI to type even a word of it.

While I think of software development as a craft as well, it is also my profession, so this pace just doesn’t work. Documentation has been my weak point, I procrastinate on it a lot. Previously I would just create documents or sheets called “links dump” or “brain dump” around a certain topic and paste a lot of links in them with a few lines of narration, questions or quotes from the links. That combined with pictures of diagrams drawn in whiteboard sessions and notes taken on the fly that even I had trouble understanding after a month would be the baseline of the level of documentation I would do. I would go back after the fact to do proper documentation with proper headings and sections and convert the whiteboards to draw.io diagrams. But lately, more often than not, this would be abandoned in favour of something else.

Recently what I have started doing is feeding the whiteboard diagrams into LLMs to generate draw.io XML markup. As well as feeding in the whiteboard diagrams, the rough notes, link dumps, and writing some instructions around structure and adding relevant information at prompt time to generate written documentation.

This gets me over blank canvas paralysis. The documentation AI output will likely not completely match the original intent or reflect reality. But it most likely won’t be complete garbage either.

Cunningham’s Law states:

the best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer.

This principle compels me to correct the AI. I also have to cut down and edit the output because by default LLM output can be quite verbose. This is still faster to do than writing from scratch. Seeing what is not correct makes me realize and articulate what is correct.

AI Rubber Ducking

For those unfamiliar with the term

Rubber duck debugging is a debugging technique in software engineering, wherein a programmer explains their code, step by step, in natural language—either aloud or in writing—to reveal mistakes and misunderstandings.

The name is a reference to a story in the book The Pragmatic Programmer. It tells a story of a developer who carried a rubber duck and explained their code to it line by line.

A lot of people these days outsource all their thinking to AI. While I see the utility in using LLMs as a research tool they aren’t always accurate and I’d still like to keep using my brain and keep getting smarter.

I have found that LLMs can be useful and used in tandem with Google searches. Google searches work best when you know the right keywords, with LLMs you can just describe things and you get an output from which you can extract keywords. You turn “unknown unknowns” into “known unknowns”. What you get out of it might not be what you are looking for, but it can still be useful and filed away for later. It can also be total garbage.

The discipline I have built around using AI for research is that I always start with a rubber duck document. First I describe the context and the problem statement I am trying to solve. Then I describe how things currently work. I write down the obvious things that I wouldn’t have to say to a colleague. Then I describe as many possible solutions I can think of. Followed by questions, considerations, and open research items.

I research the questions and considerations myself first and think of follow-up questions as well but I don’t write it down in the document. I then paste the document into the LLM, this is the prompt I use:

Imagine this as a whiteboarding/brainstorming session. I’m going to present my research to you. Then you ask clarifying questions and make comments.

Most of the time, 80%-90% of what the LLM outputs, both questions and answers, I have already thought of. What I haven’t thought of, if it is immediately apparent is useful, I research and try to verify by official documentation or forum discussions. If I can’t find anything or I find the claim dubious I drill down.

Grunt Work

Developers love spending a day or two automating a task that takes 5 to 15 minutes of their time to do manually to improve developer experience. This is not always possible as this work is a distraction from more pressing work that needs to be delivered to customers and generate revenue.

Things that improve developer experience can be delegated to AI without much thought. As these are scripts you know are possible, but you just haven’t found the time to write them.

For example, I recently had Claude write a script to look at the sObjects in a branch of a repo and create typescript types for them. I did find a repo on GitHub that does this, but it hasn’t been updated in 2 years. I figured it would take more effort to review someone else’s code than to describe what I want, have it generated and test it. This is way less stressful than using LLMs heavily for feature development, because if things blow up they don’t blow up publicly in front of customers.

Even if grunt work is customer facing, I would still consider AI use with some caveats. An example is updating all user facing text to support internationalization using a library like i18next. This is a task that if done manually would require a lot of time, complete focus while also being pretty boring. Using AI assistance here speeds up the mind numbing part of replacing user facing text with labels in the code and mapping labels to user facing text in translation files.

While an engineer still needs to give their full attention to the review, which is still pretty dry work in my opinion, they would be done sooner. Then be able to focus on more creative and novel tasks.

This is more high risk than using it for developer tools because if something goes wrong there is a chance the issue makes it to the customer. It would be a very easy to spot issue and would make us look sloppy. It is still lower risk than heavily AI reliant feature development as whatever goes wrong will be front and center and easier to fix. If it isn’t caught by the developer, it will most likely be caught by QA or anyone that looks at it. But with a few guardrails in place, like doing a thorough POC and identifying what can go wrong and how to validate when things are correct, using AI for this kind of work avoids developer burnout due to monotony.

Summary

I hadn’t thought of the Anthropic article about the study since I read it in early February. I read it again while preparing to write this. Just because I half remembered the following quote and wanted to cite it:

Cognitive effort—and even getting painfully stuck—is likely important for fostering mastery.

But reading it again made me realize, what it talks about holds true for my journey. My earlier attempts with LLMs fell in line with the low scoring patterns identified.

And I realized what I am doing right now aligns with the high scoring patterns.

AI rubber ducking aligns with “Conceptual inquiry”. Though I didn’t describe my workflow, the i18next grunt work POC example somewhat maps to “Hybrid code-explanation”

While the sObject script grunt work example is still “AI delegation”, I feel that it is ok because there was no new skill to learn there.

The documentation approach has nothing to do with the Anthropic article and is what I am most excited about. I’ve come across the saying that brains are for having ideas not holding them. I would hold ideas in my head out of laziness only half committing to writing them down, but using AI for it solves the blank canvas problem for me, giving me something to work with.

Finally, at the start of the documentation section I told you that I am 3 hours into writing. I am just finishing at almost 7 hours. I can’t, or maybe I shouldn’t spend this much time writing an ADR document at work. Drawing diagrams on a whiteboard and writing rough notes without much care for structure really speeds up the process and arguing with AI keeps me engaged.