For the past few months, I’ve been using the Android note-taking app LectureNotes to keep a digital lab journal on my Lenovo Yoga Book.
To enable me to more easily share my notebook with my colleagues when the need arises, I would like to automatically convert my records to PDF format. While the app has a PDF export feature, I’d much rather have a PDF generated every day without manual intervention. To do this, I have to reverse engineer the file format LectureNotes uses to store my notebooks.
Basic structure
Thankfully, LectureNotes stores its data on the file system on the tablet’s internal storage (at /Android/data/com.acadoid.lecturenotes/files
), making it easy to back up and relatively easy to parse.
Notebooks can optionally be organised into folders. As far as I can tell, this only goes one level, that is to say there cannot be folders within folders. On disk, every folder and every notebook are directories, containing some metadata in XML files. The file system layout looks something like in Figure 1.
Metadata
The name of the directory is the name of the folder or notebook, respectively. The XML files contain:
- various metadata about the cover shown on the notebooks board (colour and so on) – which is hardly of relevance for PDF generation.
- information on the state of the app, such as which page and layer were last in focus – again, I don’t care.
- some important ‘structural’ information about the notebook:
- The page size in pixels:
<paperwidth>1920</paperwidth>
<paperheight>2717</paperheight>
- Details about the page background:
<paperpattern>1</paperpattern>
<papercolor>16777215</papercolor>
<patterncolor>12632319</patterncolor>
The trouble is that paper pattern ‘1’ is only meaningful to LectureNotes itself. Pattern ‘0’ is no pattern (just a solid colour).
- Settings for the text layer:
<textlayersettings>1</textlayersettings> <!-- always 1? -->
<textlayerfontfamily>0</textlayerfontfamily> <!-- no idea -->
<textlayerfontstyle>0</textlayerfontstyle>
<textlayerfontsize>61.0</textlayerfontsize>
<textlayerfontcolor>-16777216</textlayerfontcolor>
<textlayerleftmargin>0.005</textlayerleftmargin>
<textlayertopmargin>0.005</textlayertopmargin>
<textlayerrightmargin>0.005</textlayerrightmargin>
<textlayerbottommargin>0.005</textlayerbottommargin>
Colour and style are described below. The font size appears to be 4/3 of the font’s em size in pixels. I have no idea whether this metric was chosen deliberately, or if it’s an accident of implementation. Perhaps it has something to do with the line height? The margins are clearly fractions, probably of the page height and width. I have no idea how to interpret the first two settings.
- Some information on layering:
<layers>2</layers>
<displayedlayers>3</displayedlayers>
<textlayer>2</textlayer>
<displaytextlayer>1</displaytextlayer>
In this case, there are two bitmap layers, plus the (forever single) text layer. All three layers are visible, and the text layer is between bitmap layer 1 and bitmap layer 2. When the text layer is below layer 1,<textlayer>
is1
. When the text layer does not exist,<displaytextlayer>
is still set to1
, but<displayedlayers>
doesn’t count it!
- The page size in pixels:
- Some things that sound possibly important but completely opaque:
<paperscale>0.5</paperscale>
<paperfit>0</paperfit>
Notebook file layout
For every page, there are files called pageN.png
(e.g. page1.png
) and uuidN.txt
. If a page contains multiple bitmap layers, there is an additional PNG file for each layer, named page1_2.png
and so on. These images have transparent backgrounds and equal size; they can simply be stacked on top of one another (… if there’s no text).
If there is text in the main text field on a page, there will be a file textN.txt
. IFF text styles have been changed (vis-à-vis the default), there is a corresponding file textN.style
. If there are supplementary text fields on a page, there will in addition be files called textN_1.txt
, textN_1.box
and if necessary textN_1.style
. More on all of these later.
If keywords have been set on a page, there will be a file keyN.txt
. If there are multiple keywords, these are separated by line feeds (\n
).
Opening the notebook overview generated thumbnails of all the (rendered) pages, which are stored as thumbnailN.png
in the same directory. If the notebook overview has not been opened, these will not exist.
The notebook directory also always contains an empty PNG image of the right size as empty.png
, and a small (84×84) icon for the notebook.
Text fields
The text*.txt
files are easy enough to understand: they simply contain plain text, without any formatting or markup. Beyond that, it gets a little more complicated.
‘Box’ files, e.g. text2_1.box
, contain four numbers (example 1), separated by line feeds. The first two are the fractional X and Y distance of the top-left corner of the box from the top-left corner of the page. The others might be the fractional offsets of the bottom-right corner from the bottom-right corner of the page.
The text*.style
files contain a list of line feed separated instructions on how and where to change the formatting of the text. Every instruction contains five parts, separated by spaces:
- A command
- An argument
- The first character (zero indexed) this line applied to
- The first character it does not apply to
- <unknown>
Note that the start and end points of the instructions do not have to describe a nicely nested structure: this is not XML or Lisp, this is a finite-state machine.
These are the commands I have been able to identify:
Command | Comment |
---|---|
typeface | Set font (serif, sans-serif, monospace) |
styleset | Set style (bitmask; 1 = bold, 2 = italic) |
stylexor | XOR style |
underline | Enable underline (arg: 0 – ignored?) |
underlinexor | XOR underline bit (arg: 0 – ignored?) |
foregroundcolor | Set text colour |
relativesize | Arg: new size in current em |
subscript | Shift text down (guess: by 50% of the original font size?) |
superscript | Shift text up (ditto) |
Note the subscript and superscript commands only change the position, not the font size.
Colour
Colours are stored as 32-bit integers; the most significant byte is always 255 (alpha perhaps), the three other bytes are red, green and blue. For instance, -16776961 = FF0000FF
is blue, -65536 = FFFF0000
is red, and -16777216 = FF000000
is black.
Timestamps
Timestamps (not that they matter to me) are 1/1000ths of UNIX time, that is to say as milliseconds from 1970-01-01 00:00:00.000 UTC.
Conclusion
I think I have gathered sufficient notes on the LectureNotes format to be able to reconstruct with reasonable success most notebooks using a script of some kind, with some trial and error.
There are still some details missing, but these can probably be safely ignored and/or filled in during the next steps.
What won’t be so easy to fill in are the page background patterns; however, I don’t care a lot about these seeing as I prefer a white background for my own notebooks anyway.
PS: I have published a program to convert LN data on GitHub at tjol/lecturenotes2pdf
This article is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.