AI for Converting Handwriting to Digital Text

ChatGPT became publicly available on 30 November 2023, and my software-engineer grandson mentioned it to me a few days later. Ever since, I have been amazed over and over at the power of these incredible AI tools that have become available. During the ensuing months, I have shifted from ChatGPT from OpenAI, to Claude from Anthropic, to Grok from xAI, moving over time from one to the other and back again.

I first subscribed to ChatGPT, which I later cancelled for my move to Claude. I have now cancelled that subscription to move to SimTheory (https://simtheory.ai/). The switch was prompted by being able with this service to gain access to many of the models available today through several sites. (I have listed those here.) I also still use several of the popular models directly in the “free” mode, often reaching the limit available to that tier. These include Gemini from Google and Copilot from Microsoft, and Gemini seems to be leading the pack at the moment.

I make extensive use of these various tools to support the software development I do for the service mission in which I am engaged for my church (The Church of Jesus Christ of Latter-day Saints). I have not settled on one particular model, and the progress I have seen over the past two-plus years has been incredible. It is amazing to see how much more is possible now than at the beginning. Since the beginning, all the major systems have added capabilities like the submission of screenshots, several code files, and an interactive process that is totally amazing.

My 56 years of programming experience is obviously important and has enabled me to learn how to put these tools to work. At this point it is an understatement to say that I am able to function at a level of software engineering that is FAR BEYOND what I would be able to do on my own. While the progress I have seen has been enormous, it has also been gradual, albeit rapid. With that, the leap forward I have seen today has been nothing if not incredibly astounding!

Indeed, this example is mind boggling, to say the least. Rather than being an experience related to programming, however, this development is in the area of handwriting recognition. This effort started also in 2023 with family records and journals. I use OCR on Adobe Acrobat Pro for scanned typed and printed documents. For handwriting, however, something else was needed. I discovered Transkribus, described by Wikipedia as “a platform for the text recognition, image analysis and structure recognition of historical documents” (See https://www.transkribus.org/). That system is amazing, but it requires the scanning and training with at least 50 pages of text.

About that same time, I was using Claude and tried it for handwriting recognition with a journal my wife, Annie, kept from 1977 to 1989. Filled with many stories of our experiences and adventures with our family, this is a veritable treasure that we want to share with our children and grandchildren. She is French, so it is written in French, in her beautiful script that takes some getting used to. Here is a page:

What has totally blown me away today, is the comparison that I just discovered between what Claude (and the others I tried as well!) produced in June of 2023 and what it created today. To show this to Annie, I created a side-by-side Word document to show to her. Needless to say, she was impressed. Indeed, she reviewed and corrected the Word document of the three full pages from her journal and found only 6 corrections that needed to be made in the 847 words from the three pages.

When I thought about posting something about this amazing development, I remembered that not everyone reads French. To address that issue, I came with the idea of using AI to create a side-by-side version how the ability to recognize handwritten text has improved significantly. I wanted to use colors to create a display for the comparison of the 2023 version and the one from today.

To do that, I submitted the Word document to AI, specifically this time to Gemini 2.5 Pro from Google. BTW, many experts are raving about this most recent effort from Google, which so far I am finding quite impressive. Gemini helped me come up with the comparison I wanted by providing the code to display the results on a page here in WordPress.

I will write later about that process in a more detailed post, but the full comparison of these first three journal pages is available at this link. In the meantime, here is an excerpt from the first page from that comparison:

About Mike

I retired as a professor at Brigham Young University (BYU) in 2016 where I was Associate Professor of French and Instructional Pyschology & Technology. I arrived there in 1992 after my retirement as a Lieutenant Colonel from a 20-year career in the US Air Force. Most of that time was spent on the faculty at the US Air Force Academy (USAFA), during what I call my first career. For over forty years I have been creating interactive video applications for supporting language. The lab at the Language Learning Center at USAFA engaged in ground-breaking efforts conducted within a mentored learning setting. The lab’s work involved the development of technologies and instructional design strategies for the use of video in the language acquisition process as well as with architectures that support online learning and facilitate learning about learning. I have a BA in Political Science from BYU, an MBA from the University of Missouri, and a PhD in Foreign Language Education and Computer Science from The Ohio State University. At the Air Force Academy I was a key member of the team that designed what was then the largest interactive videodisc-based learning center on a college campus. When I retired from BYU I directed the ARCLITE Lab, which was involved in the creation of online learning materials for language learning as well as video and interactive technologies for learning.
This entry was posted in Geeky Stuff, Generative AI. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *