Ford had to hire back former engineers to fix mistakes made by its automated systems

VetOfTheSeas@discuss.online · 1 day ago

Ford had to hire back former engineers to fix mistakes made by its automated systems

Caveman@lemmy.world · 1 day ago

I’ve been using an LLM for programming for last 6mo and it needs constant babysitting. It’s basically something that just does the most straightforward thing without consideration of nuance, maintainability or whether to actually split into a module. This is very much not surprising.

kossa@feddit.org · 14 hours ago

Yep. I realized that the best use case is just using a LLM as a supercharged StackOverflow, where answers are based on my specific problem.

I just use it as chat, have it configured to give me at least two options as answer, better three. Then I feel still in charge. And if I like a proposed solution I only take it over line by line and tweak it to my liking. So I still “wrote” the code and would rightfully feel responsible for the result (again, like I used StackOverflow).

Never ever would I let it go rampage as agent on my codebase. That’s terrible 😱.

Lovable Sidekick@lemmy.world · 20 hours ago

The active dev I know who uses it daily says it drastically reduces the time he spends on routine tasks and sometimes comes up with novel approaches, but he definitely has to check everything. The problem is inexperienced or non-experienced people thinking it’s a magic lamp you can rub and it poofs out an expert programmer.

jj4211@lemmy.world · 21 hours ago

It’s obnoxious enough to try to use for myself, sometimes useful but obnoxious to review the code and just constant screwups except for exceedingly boilerplate stuff or stuff that can take some sloppiness (e.g. LLM can make it easy to indicate some variables to get from argv and do the tedium of that plus help text plus man page edits and generally do that fine). Even if it doesn’t screw up obviously, if the code is verbose, I know a screw up is lurking and just ditch it and do it myself.

However, the real pain comes in as other people use it. Just today someone had an issue and normally they’d ask a developer for help and offer debug appropriate information and/or access. However, they “just had Claude do it, even used Opus 4.8 to make sure it’s good” and it generated a very verbose report on the issue, why it went wrong, and the appropriate change to make it work. Very detailed and the explanation sounded quite reasonable. Problem was that it was horribly and absolutely wrong, a fiction of a rationalization over a bad code change. It made a change that happened to appear to work for him, but in reality it replaced a failure due to unrecognized data to silent corruption of the data in a facet the user specifically did not care about. Claude claimed it was correctly mapping the unrecognized data correctly, but it just made up a completely untethered conversion based on nothing. Now I could tell the explanation and code change was bullshit at a glance, but it became an argument because the user wouldn’t give me actionable debug details because “he already had Claude fix it”. I had to keep trying to find holes in the Claude rationalization that the user would also recognize, and he sided with Claude four times until the fifth problem in Claude’s explanation finally stuck (it asserted that the problem was due to running a specific outdated version of a specific software, problem being that specific version never even existed, and the minimum “good” version was 10 years old and the version the user was running was about a month old).

I don’t understand how people get this far and still don’t understand that AI is much better at sounding plausible than being correct.

bitjunkie@lemmy.world · 1 day ago

It’s a bottom-dollar offshore team that can type faster. It depends on the situation whether that’s something that’s worth the time and effort to manage.

IphtashuFitz@lemmy.world · 1 day ago

Yeah, I use it for some programming tasks as well. I’m sick and tired of telling it that it did something wrong or simply omitted something, only to have it apologize and offer to fix its own mistakes.

punkfungus@sh.itjust.works · 21 hours ago

You should be glad it’s apologizing. On the occasions I’ve used it to try to actually write code for me it’s had a tendency to blame me for its mistakes.

It writes a function that gets stuck in an infinitely recursive loop that never exits, I point it out and it’s all “Aha! You’ve fallen for a classic recursion trap!” What do you mean I’ve fallen for it?

Between those experiences and seeing the hot garbage some of my coworkers vibe coded, it was enough for me to relegate LLMs purely to the “ask questions that you would have searched for on StackOverflow” role. And it frustrates me that search was made so impotent that it’s not a real option to avoid the LLM entirely. The multiple answers and perspectives on SO were often really valuable.

tempest@lemmy.ca · 1 day ago

The major problem is that the work of an LLM has a massive number of hidden ‘assumptions’ that you need to be aware of. If you don’t already have a good working knowledge of the task you wont have an intuition about those assumptions. It’s annoying.

jj4211@lemmy.world · 21 hours ago

Yeah, and for those who don’t know, the rationalization output of the LLM is just so pursuasive. It sounds quietly confident and rattles off things that sound like real details.

People are believing the LLM output over actual human experts and the human experts have to expend non-trivial effort trying to disprove an LLM output before they can get on with the business of doing it right.

Caveman@lemmy.world · 1 day ago

Yeah, like cards being the best thing ever on front-end. If I want a basic layout I put “avoid cards” usually because then it’ll use them sparingly. But yeah, it’s just autocomplete so it’s always going to get the lowest common denominator code based on what it decided to look at.

I’m now looking into finding something faster than GPT/Claude where I can ping pong prompt faster and get what I want. Whatever it writes first is just a first draft anyway.