I'm doing it again. The pdf library organisation, I mean.

As the long-term readers among you might know, I am using EndNote for my citation and bibliography needs. Theoretically, I also have all the .pdf files that I have of articles in one folder on my system, and all the titles of these articles in the EndNote database.

Theoretically. Which, unfortunately, is not at all what I really have - over time, a large stack of .pdfs has accumulated that are not yet in the database, because it all meant opening the file, copying the relevant bibliographical data into the database, then saving the file under a new (and unique) name also written to the database file. Yes, the programme can theoretically attach files to database entries - but I have never gotten the hang of that, and I also prefer to have things separately in case of desasters.

So here I was, with a stack of pdf files - unsorted, and with quite a few non-articles crept in between them - and my database. Enter Qiqqa, a database/citation tool geared entirely towards .pdf collections. (If it were a little less geared towards those, and a bit more open and more import-friendly from EndNote's end, I would have considered switching to it completely.) In my process of checking out Qiqqa, I already tried to use it for sorting, organising, and EndNote-ing my pdf files, but it turned out to be a tiny bit less trivial than I had thought.

So I have made a clean slate in Qiqqa and have now tackled (again) the task of sorting my files and inputting them into EndNote. It's still a multi-step process, but much less tedious than before. Preparation step was to make three new folders for sorting: One to hold the batch of pdfs for processing, one to hold the exported files, and one for the "rejects" - files that have obscure bibliographical data that will have to be entered by hand.

Step one: Move a batch of pdf files from the big heap into the processing folder.
Step two: Import that folder into Qiqqa (or set it as watch folder). The programme will now index (and, if necessary, OCR) those files.
Step three: Use the inbuild BibTex-Sniffer to match bibliographical data to the individual files, and delete all the non-articles from the library.
Step four: Make sure to move all the files for hand processing from the processing folder to the "rejects" folder (else they will be lost), then delete them from the library.
Step five: Export bibliographical data to a .bib file.
Step six: Export complete library to the export folder.
Step seven: Convert BibTex-file to an Endnote .xml file using this nifty little programme.
Step eight: Import bibliographical data into Endnote (excluding duplicates). (I only had one minor glitch with importing up to now which seems to have been an incompatible record type number.)
Step nine: Add "pdf file available" or similar thing into a suitable field of each of the new references (this can be done quickly with "change and move fields").
Step ten: Move all the exported files from the "doc" subfolder in the export folder into the regular folder for referenced pdf files.
Step eleven: Delete everything from the export folder and the processing folder.
Step twelve: Delete all the entries of the library.

Then start over... until everything is processed. It takes some time, but on the other hand, it allows me to be sure I get everything referenced and lets me clear out all the other pdfs that crept in without too much woes. And with the possibility to do this in smaller batches, it's also not so overwhelming to add hundreds of BibTex entries at once.

(And if this blogpost has made you want a bibliography programme/database, here is a list of those currently available, including EN and Qiqqa.)