LLM Uses Metaphors to Explain Problem

I am developing software using VB.net and the OpenXML library to substitute translations and screen captures into a master Microsoft Word document in English. The purpose is to create new, target-language Word documents of the originals. Text has its challenges (multiple runs of text for a single string of formatted text, just to name one). Graphics, on the other hand, move things to a whole new level of complexity. Replacing an image requires the software to understand the layout information of the original image for insertion of the new image, and this gets very tricky.

I had been working with Gemini 2.5 Pro in developing the software, but the images were being sized incorrectly upon insertion into the new document. After one version that distorted the images upon insertion, the next version had images extending beyond the edge of the page like this:

That portion of the page should in fact look more like this:

Even that version, however, is not ideal. The screen capture is from the software being explained and does not occupy such a significant portion of the screen, which wastes space in the final document.

So, after that version, I returned to Gemini Pro 2.5 with this prompt:

That works better, but some of the graphics were captured at such a high resolution that they filll the whole page. Give me no new code, but give me some ideas to think about. I am thinking that I might also need to furnish you with the actual XML of the file so you can see what we might use. In the original version, the images were sized for a better page layout, so there must be some way to get that info.

Beyond wording in the response that was sycophantic at times, I found the use of metaphors rather intriguing:

  • “The “scale to page width” logic is a sledgehammer. It’s a great fallback, but what we really need is the scalpel—the exact dimensions the original author intended”
  • “Golden Ticket”
  • “we were essentially replacing the engine of a car but leaving the original chassis, wheels, and body. The new engine was too big for the chassis.”

Here is the excerpt from the exchange that contains the use of metaphors:

Wanting to reflect more on what I was seeing there, I asked Anthropic’s Claude:

Is the use of metaphors more than clever syntax?

The full response was quite interesting and seemed to confirm what I was thinking: LLMs often appear to be doing more than simply placing the next word in a sentence according to statistical probabilities! Here is the final paragraph from Claude’s response:

So while metaphors can certainly serve as elegant rhetorical devices, their primary significance lies in their role as cognitive tools that structure thought, enable conceptual understanding, and mediate between abstract and concrete domains of experience.

Now, on to the next version of my code to to replace images!

Note: I use Simtheory.ai to access all the primary LLM engines, which I highly recommend: For one reasonable fee, the subscriber has access to many models.

Posted in Uncategorized | Tagged , , , | Leave a comment

Do Thinking Machines Actually Think?

I was intrigued as I perused this article this morning, “Thinking Machines That Don’t: Confronting AI’s Biases & System Flaws” by Markus Bernhardt. As I read, it was clear that the author’s goal was to support a claim often heard about the artificial intelligence (AI) tools that have come on the scene over the past two years. The article states, “At their architectural heart, LLMs are not thinking entities but extraordinarily sophisticated sequence processors.” Specifically, the claim is that large language models merely place one word after the other in an order derived from the probabilities of word occurrence that it has obtained during training by being fed huge amounts of training data.

I submitted the piece to xAI’s Grok and requested that it summarize the piece in a single paragaph:

Markus Bernhardt’s article, Thinking Machines That Don’t: Confronting AI’s Biases & Systemic Flaws (July 7, 2025), critiques the limitations of large language models (LLMs), arguing that their fluent, confident outputs create a deceptive illusion of understanding despite lacking genuine cognition, intentionality, or a persistent world model. It highlights the “paradox of neutrality,” where efforts to make LLMs unbiased result in a bias toward acquiescence, reinforcing dominant norms and flattening complex considerations, as seen in corporate and educational settings. The article also points to LLMs’ tendency to perpetuate prevalent discourse due to their statistical foundations and their “contextual collapse,” which applies a uniform approach across diverse knowledge domains, risking epistemological missteps. While promising a follow-up on persistent error patterns, the piece calls for critical, human-centric use of LLMs to counter their inherent flaws and prevent over-reliance in high-stakes contexts.

I also asked Grok to assess the piece and received this rather lengthy commentary that included this:

Conclusion

Bernhardt’s article is a necessary wake-up call, urging us to look beyond LLMs’ polished outputs and confront their systemic flaws: the illusion of understanding, the paradox of neutrality, prevalence reinforcement, and contextual collapse. These critiques are grounded in the technical reality of LLMs as statistical pattern-matchers, not cognitive entities. While the article slightly overgeneralizes the risks and underplays potential mitigations, its core claims are valid and urgent, especially as LLMs proliferate in critical domains. It’s a reminder that tools like me are powerful but fallible, requiring active, critical engagement from users to avoid being misled by our confident facades. The promised follow-up on error patterns should further enrich this discussion, and I look forward to reacting to it when it’s published.

It is interesting to note what some might feel is a remarkable sense of self-awareness in the statement there, “It’s a reminder that tools like me are powerful but fallible… The promised follow-up on error patterns should further enrich this discussion, and I look forward to reacting to it when it’s published.”

While I have no substantive rebuttal to the idea that these systems do not “think,” I have to say that I am often amazed at the non-trivial insights often provided by any one of the several LLMs I use on a daily basis. Also interesting is the fact that my wife, a non-user of LLMs frequently agrees with me when I share one of those responses with her.

Posted in Generative AI, Large Language Models | Tagged , | Leave a comment

AI for Creating Lists of Terms for Software Manuals and Translations

User manuals for software typically make use of terms that were used during screen design for the software. Such lists of terms are especially useful if the software is to be internationalized. For a project I am working on to create such documentation, I was unable to come up with such a list.

Casting about to find a way to address the issue of how to proceed without a list of terms. I contemplated simply compiling a list from the documentation itself, which I did in French. The results were less than satisfactory. My thinking then turned to having the translators for each of the three languages review the first section in hopes that this process might suggest the terms to be added. That effort also met with challenges.

Remembering the 106 screenshots that we have for the software in all four languages, a solution for a reasonable approach began to take shape. Returning to the idea of collecting the needed terms from those screenshots, I quickly realized it would be a daunting task to do this manually. This approach would also be one fraught with the possibility of many mistakes.

I recently posted about my project to do automatic handwriting recognition on a journal my wife kept for many years while our children were young. After setting that aside due to challenges encountered two years ago, I decided to try again a couple of weeks ago and was nothing short of astounded at what I found.

With that in mind last week and wanting to find a solution for the list of terms I needed, it hit me that I could get generative AI tools to do optical character recognition on the 106 screenshots in four languages to produce the needed list. After a quick couple of successful experiments submitting screenshots via the Web interface and retrieving the needed text, I was impressed. That success led to the realization of what it would take to go through that process with the 424 images involved. What was needed was a program to submit screenshots, retrieve the results, and then match terms from English to the other three languages.

I then worked with Google’s Gemini, to set up access to the system’s API (Application Programming Interface). Gemini helped me create a program in Python to submit the 424 images to Gemini’s large language model and receive a total of over 2,000 terms in each language. Then came the daunting task of matching up those terms across languages. I had a couple of unsuccessful attempts, the result of asking the AI to match terms in two lists of over 2000 items each. I suggested a divide and conquer approach, but that did not help. The firstg couple of attempts met with total failure. Indeed, one of the systems AI simply provided some code and a dozen terms. Another simply responded that it could do what was need needed, but it then abandoned the task with no further explanation. Yet another (the Chinese DeepSeek system running on Groq hardware to accelerate processing) generated an incredibly lengthy explanation of the difficulty of what I was asking. The response to the same prompt I had supplied the other systems began with this:

Okay, so I need to figure out how to solve this problem where I have to calculate the number of ways to arrange some objects with certain constraints. Let me try to break it down step by step. First, I remember that when dealing with permutations and combinations, the key is to understand whether the order matters and if there are any restrictions. In this case, the problem involves arranging objects where some are identical and others are different, and there are specific constraints on their positions. Hmm, so let’s say I have a total of n objects. Out of these, there are some identical objects and some that are distinct. For example, maybe I have 3 identical red balls and 2 distinct blue balls. The problem might ask how many ways I can arrange these balls in a line or a circle, considering that some positions are restricted. Wait, the user didn’t specify the exact problem, but I think I should consider a general approach. Let me think of a common scenario. Suppose we have n objects where there are duplicates and some specific constraints, like certain objects must be placed in particular positions or can’t be placed next to each other.

That was the first of FOURTEEN pages of explanation! Needless to say, more work was needed on the approach I was taking. 😊

The good news is that the failures I encountered prompted me to come up with an approach that worked. I returned to Gemini for help creating code in Python to submit the English screenshots one at a time, doing the matching across the other three languages. The procesing included and saving the results into an Excel file, which resulted in 2,382 entries. After using Excel to remove the duplicates, 368 items remained, with some cleanup still needed. Et voilà!!

The following illustrates screenshot illustrates user error on my part. I made a mistake and used an up-to-date set of English screenshots, but the list of screenshots for the other languages were out of date.

This once again proves the old GIGO adage, “Garbage in, garbage out!”

Posted in Generative AI, Large Language Models | Leave a comment

AI for Converting Handwriting to Digital Text

ChatGPT became publicly available on 30 November 2023, and my software-engineer grandson mentioned it to me a few days later. Ever since, I have been amazed over and over at the power of these incredible AI tools that have become available. During the ensuing months, I have shifted from ChatGPT from OpenAI, to Claude from Anthropic, to Grok from xAI, moving over time from one to the other and back again.

I first subscribed to ChatGPT, which I later cancelled for my move to Claude. I have now cancelled that subscription to move to SimTheory (https://simtheory.ai/). The switch was prompted by being able with this service to gain access to many of the models available today through several sites. (I have listed those here.) I also still use several of the popular models directly in the “free” mode, often reaching the limit available to that tier. These include Gemini from Google and Copilot from Microsoft, and Gemini seems to be leading the pack at the moment.

I make extensive use of these various tools to support the software development I do for the service mission in which I am engaged for my church (The Church of Jesus Christ of Latter-day Saints). I have not settled on one particular model, and the progress I have seen over the past two-plus years has been incredible. It is amazing to see how much more is possible now than at the beginning. Since the beginning, all the major systems have added capabilities like the submission of screenshots, several code files, and an interactive process that is totally amazing.

My 56 years of programming experience is obviously important and has enabled me to learn how to put these tools to work. At this point it is an understatement to say that I am able to function at a level of software engineering that is FAR BEYOND what I would be able to do on my own. While the progress I have seen has been enormous, it has also been gradual, albeit rapid. With that, the leap forward I have seen today has been nothing if not incredibly astounding!

Indeed, this example is mind boggling, to say the least. Rather than being an experience related to programming, however, this development is in the area of handwriting recognition. This effort started also in 2023 with family records and journals. I use OCR on Adobe Acrobat Pro for scanned typed and printed documents. For handwriting, however, something else was needed. I discovered Transkribus, described by Wikipedia as “a platform for the text recognition, image analysis and structure recognition of historical documents” (See https://www.transkribus.org/). That system is amazing, but it requires the scanning and training with at least 50 pages of text.

About that same time, I was using Claude and tried it for handwriting recognition with a journal my wife, Annie, kept from 1977 to 1989. Filled with many stories of our experiences and adventures with our family, this is a veritable treasure that we want to share with our children and grandchildren. She is French, so it is written in French, in her beautiful script that takes some getting used to. Here is a page:

What has totally blown me away today, is the comparison that I just discovered between what Claude (and the others I tried as well!) produced in June of 2023 and what it created today. To show this to Annie, I created a side-by-side Word document to show to her. Needless to say, she was impressed. Indeed, she reviewed and corrected the Word document of the three full pages from her journal and found only 6 corrections that needed to be made in the 847 words from the three pages.

When I thought about posting something about this amazing development, I remembered that not everyone reads French. To address that issue, I came with the idea of using AI to create a side-by-side version how the ability to recognize handwritten text has improved significantly. I wanted to use colors to create a display for the comparison of the 2023 version and the one from today.

To do that, I submitted the Word document to AI, specifically this time to Gemini 2.5 Pro from Google. BTW, many experts are raving about this most recent effort from Google, which so far I am finding quite impressive. Gemini helped me come up with the comparison I wanted by providing the code to display the results on a page here in WordPress.

I will write later about that process in a more detailed post, but the full comparison of these first three journal pages is available at this link. In the meantime, here is an excerpt from the first page from that comparison:

Posted in Geeky Stuff, Generative AI, Large Language Models | Leave a comment