Cyrillic and e-readers

February 13, 2013

[Update 10/10/2014: The trick described below should still work, but there’s a better way. Calibre has been improved and can now convert files from Microsoft Word format to e-reader–friendly formats like .epub and .mobi. If you set the title of the piece as “heading 1” and chapter or section titles as “heading 2,” it should even auto-generate a table of contents that your device can recognize. I recommend this if you have large amounts of public domain text that you can put on the clipboard, and it’s also handy if you want to look over a draft of your own work on an e-reader.]

With every year I take less pride in being the kind of person who subscribes to a print newspaper and knows how to use a microfilm reader and am more willing to try out whatever new reading technology is offered. I spent a good part of this morning trying different ways of putting public domain books in Russian onto an e-reader, and I thought I’d share the results, which are no doubt subject to change and depend on your exact situation. In my case, I was going from old books on the web in .html format (think to a Kindle Touch.

1. Copy the text of the book into a word processor.

2. Save the file in .txt format, with Unicode encoding.

3. Use the e-book manager Calibre to convert the .txt file into a format your e-reader handles well, like .mobi or .epub. Use “edit metadata” in Calibre to set the author and title as you want them.

4. Send the .mobi or .epub file to your e-reader.

I found that these four steps gave me a pleasant-to-read text. If I tried to convert from an .rtf file instead of .txt, I got garbled Cyrillic (the Latin characters with diacritics that you used to see in web browsers set to the wrong encoding). I suspect a .txt file not set for Unicode would be equally bad. Having Amazon convert a .doc or .docx file worked better, but not well; there were a lot of unnecessary and unwelcome line breaks. Converting from .pdf has always been awful, in my experience. When I tried to go directly from .html I got legible but annoyingly formatted text, all underlined and centered on each line.

Calibre is also handy if you need to convert books from .epub format to something else, like .mobi.

For newer titles (and older ones before I figured out how to get the format right) I’ve also had good luck buying cheap e-books at and getting them in the format I want directly. These have nice bells and whistles like a built-in hyperlinked table of contents.

I see not much has changed since this 2011 comment thread at Lizok’s blog. Steven Lubman’s recommendation of Calibre meant nothing to me when I first read it, but now I can second it. I encourage anyone who knows more about this than I do to add your latest tricks in the comments here!

