I wanted to extract the text from a Kindle book that describes Tai Chi postures. I wanted to convert the descriptions of each posture into an easy to read numbered list that I could put into a WordPress page. The usual method would be to open each page, use the mouse to select the text and copy it using the Kindle copy button (Cntr A and Cntr C don’t work in the Kindle Reader), then open an AI and paste the contents in and ask it to convert it. Then take the converted text and copy and paste it into a WordPress page. Unfortunately in the Reader you can copy between pages, so this becomes very tedious.
I started by asking AI for solutions, but most it gave me were not any better, or had costs associated with them. Finally I chose the solution of doing an image capture of each page, and then sending all of the images through the Linux OCR program “tesseract” to convert it to a single text file. This still required navigating to each page and doing a screen capture but using the FireShot plugin this was pretty easy. FireShot has a shortcut (alt-shift-3) to do the capture, followed by a button click to save the image. Fireshot handled image naming so I ended up with sequentially numbered images.
AI then provide me the bash script to feed all of the image into tesseract:
for img in image1.png image2.png image3.png; do
tesseract "$img" stdout >> combined.txt
done
Here is a portion of that file:
D Kindle Library
1 minute left in chapter
POSTURE 2
TAI CHI
Q Aa
Beginning
Chi Shih
Inhaling slowly, raise your arms upward to
shoulder height. The wrists should be bent, the
fingers hanging down, until your arms reach the
height of your shoulders (Photo 5). Then, as you
mobilize your ch'i, extend your fingers (6). Your
arms should ascend almost as if they were raised.
from above by something outside of yourself. Now
draw back your arms by bending your elbows and,
Page 28 of 162 + 25% 41 mins left in book
OQ
TAI CHI
Note the extraneous information such as page numbers etc. I didn’t want to hand edit this so I uploaded the file to an AI and gave the AI the prompt:
“The attached file is an OCR capture describing tai chi postures. Rewrite it so the descriptions are easy to read numbered lists, and remove the extraneous lines like the “1 minute left in chapter” etc.
That gave me very clean copy to paste into a WordPress page.
I was almost done. I downloaded a video file of the entire Tai Chi form being done. I wanted to split it up into the individual postures so I asked the AI for a tool that would let me mark all of the split points and export all of the postures in one step. It recommended the Linux program LosslessCut for this. It gave me the command line for installing the program and I used that to make small video clips for each posture.
The last thing I wanted to do was add to the page a means of translating it from English into Spanish. Once again I asked an AI for the best solution and it recommended the WordPress plugin TranslatePress, and then gave detailed instructions on how to set it up to use Googles Translation API.
So with the help of AI I took a job that would have taken days to complete and did most of the work in a couple of hours. The results are at Forma de postura 37 del estilo Yang