For the past few months, I've been using the Android note-taking app LectureNotes to keep a digital lab journal on my Lenovo Yoga Book.
To enable me to more easily share my notebook with my colleagues when the need arises, I would like to automatically convert my records to PDF format. While the app has a PDF export feature, I'd much rather have a PDF generated every day without manual intervention. To do this, I have to reverse engineer the file format LectureNotes uses to store my notebooks.
Thankfully, LectureNotes stores its data on the file system on the tablet's
internal storage (at
/Android/), making it
easy to back up and relatively easy to parse.
Notebooks can optionally be organised into folders. As far as I can tell, this only goes one level, that is to say there cannot be folders within folders. On disk, every folder and every notebook are directories, containing some metadata in XML files. The file system layout looks something like in Figure 1.
The name of the directory is the name of the folder or notebook, respectively. The XML files contain:
various metadata about the cover shown on the notebooks board (colour and so on) – which is hardly of relevance for PDF generation.
information on the state of the app, such as which page and layer were last in focus – again, I don't care
some important ‘structural’ information about the notebook:
The page size in pixels:
Details about the page background:
<paperpattern>1</paperpattern> <papercolor>16777215</papercolor> <patterncolor>12632319</patterncolor>
Settings for the text layer:
<textlayersettings>1</textlayersettings> <!-- always 1? --> <textlayerfontfamily>0</textlayerfontfamily> <!-- no idea --> <textlayerfontstyle>0</textlayerfontstyle> <textlayerfontsize>61.0</textlayerfontsize> <textlayerfontcolor>-16777216</textlayerfontcolor> <textlayerleftmargin>0.005</textlayerleftmargin> <textlayertopmargin>0.005</textlayertopmargin> <textlayerrightmargin>0.005</textlayerrightmargin> <textlayerbottommargin>0.005</textlayerbottommargin>
Colour and style are described below. The font size appears to be 4 ⁄ 3 of the font's em size in pixels. I have no idea whether this metric was chosen deliberately, or if it's an accident of implementation. Perhaps it has something to do with the line height? The margins are clearly fractions, probably of the page height and width. I have no idea how to interpret the first two settings.
Some information on layering:
<layers>2</layers> <displayedlayers>3</displayedlayers> <textlayer>2</textlayer> <displaytextlayer>1</displaytextlayer>
In this case, there are two bitmap layers, plus the (forever single) text layer. All three layers are visible, and the text layer is between bitmap layer 1 and bitmap layer 2. When the text layer is below layer 1,
1. When the text layer does not exist,
<displaytextlayer>is still set to
<displayedlayers>doesn't count it!
Some things that sound possibly important but completely opaque:
Notebook file layout
For every page, there are files called
uuidN.txt. If a page contains multiple bitmap layers, there is an additional
PNG file for each layer, named
page1_2.png and so on. These images have
transparent backgrounds and equal size; they can simply be stacked on top of
one another (… if there's no text).
If there is text in the main text field on a page, there will be a file
textN.txt. IFF text styles have been changed (vis-à-vis the default), there is
a corresponding file
textN.style. If there are supplementary text fields
on a page, there will in addition be files called
and if necessary
textN_1.style. More on all of these later.
If keywords have been set on a page, there will be a file
keyN.txt. If there
are multiple keywords, these are separated by line feeds (
Opening the notebook overview generated thumbnails of all the (rendered) pages,
which are stored as
thumbnailN.png in the same directory. If the notebook
overview has not been opened, these will not exist.
The notebook directory also always contains an empty PNG image of the right size
empty.png, and a small (84x84) icon for the notebook.
text*.txt files are easy enough to understand: they simply contain plain
text, without any formatting or markup. Beyond that, it gets a little more
‘Box’ files, e.g.
text2_1.box, contain four numbers
(example 1), separated by line feeds. The first two are
the fractional X and Y distance of the top-left corner of the box from the top-left
corner of the page. The others might be the fractional offsets of the bottom-right
corner from the bottom-right corner of the page.
text*.style files contain a list of line feed separated instructions
on how and where to change the formatting of the text. Every instruction contains
five parts, separated by spaces:
- A command
- An argument
- The first character (zero indexed) this line applied to
- The first character it does not apply to
Note that the start and end points of the instructions do not have to describe a nicely nested structure: this is not XML or Lisp, this is a finite-state machine.
These are the commands I have been able to identify:
||Set font (serif, sans-serif, monospace)|
||Set style (bitmask; 1 = bold, 2 = italic)|
||Enable underline (arg: 0 – ignored?)|
||XOR underline bit (arg: 0 – ignored?)|
||Set text colour|
||Arg: new size in current em|
||Shift text down (guess: by 50% of the original font size?)|
||Shift text up (ditto)|
Note the subscript and superscript commands only change the position, not the font size.
Colours are stored as 32-bit integers; the most significant byte is always
255 (alpha perhaps), the three other bytes are red, green and blue. For instance,
FF0000FF is blue, -65536 =
FFFF0000 is red, and
FF000000 is black.
Timestamps (not that they matter to me) are 1/1000ths of UNIX time, that is to say as milliseconds from 1970-01-01 00:00:00.000 UTC.
I think I have gathered sufficient notes on the LectureNotes format to be able to reconstruct with reasonable success most notebooks using a script of some kind, with some trial and error.
There are still some details missing, but these can probably be safely ignored and/or filled in during the next steps.
What won't be so easy to fill in are the page background patterns; however, I don't care a lot about these seeing as I prefer a white background for my own notebooks anyway.
This post may be updated as further details emerge.
PS: I have published a program to convert LN data on GitHub at tjol/lecturenotes2pdf
This article is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.