Tuesday, 5 March 2013

Endnote to Latex Transfer

Peter Robejsek very kindly send me an email recently outlining a problem he had, and a solution he came up with, to transfer a Word document with Endnote references to LaTeX using an applescript?

Since at the time I had a fairly large (100+ pages) document already written in Word and wanted to transition to LaTeX I faced the problem of having to replace all the citations. Obviously doing this manually would have been terrible, so I wrote a bit of AppleScript to automatize the whole thing with the help of Excel.

I thought it might be useful for others as well especially given the large number of Mac users these days. So if you would care to publish it on the blog, please do so (if not, that's also fine). The code is not very polished since I had never written any AppleScript before and I am sure the whole strategy could be improved but back when I worked it out I was looking for a quick rather than an elegant solution.

So, here is the description and the script below in plain text. Any comments, please let me know.


This document gives a step by step procedure how to go about exporting a library from Endnote into JabRef and using that to generate BibTex. In the second part (steps 3 & 4) we show how to take an existing word document with Endnote bibliography and replace the references automatically by LaTeX code references. It is assumed that the natbib style is to be used with LaTeX and that the user is on a mac operating system.

1. Endnote
Need to get Endnote Reference Number, Author first name, date and Bibtex key into the same excel spreadsheet
a) start with Endnote. Here create an export output style (e.g. go to the finder, Usr/Applications/Endnote/Styles/BibTeXExport.ens and create copy; Rename to BibTeXExportToJabref.ens or something like that)
b) In Endnote go to Edit->Output Styles->Open Style Manager and select the newly created output style
c) In Endnote apply the new output style to your current library and go to Edit->Output Styles-> edit"new output style"
d) For every entry in the Bibliography.Templates category add another field like "|   `endnotekey = `{Record Number}" as well as a "," after the preceding field entry. The result will look like this:

|   `keywords = `{Keywords},
|   `year = `{Year},
|   `url = `{URL},
|   `endnotekey = `{Record Number}
e) Export this file from endnote as a .txt file using your newly created output style.

2. JabRef
a) add "endnotekey" field: Go to Options-> Set General Fields and type into the line beginning with "general": ";endnotekey"
b) make the field visible in Jabref: Options -> Preferences -> Entry Table Columns -> add field "endnotekey"
c) Autogenerate BibTex keys. The resulting library file will be used with LaTeX
d) also avoid having the url and note fields show up in the bibliography by going to Tools->Set/Clear/Rename Fields and set all fields to " " with overwrite active.
e) create a copy of this library file. When that is done open in JabRef and rename the "endnotekey" field. This is necessary so that its contents can be exported to *.csv. This is done by: Going to Tools -> Set/Clear/Rename Fields and renaming the "endnotekey" field to "note". This will transfer the Endnote Reference Numbers to the field note. Some entries in the database may already have text in "note" however. In this case this needs to be deleted first: Tools -> Set/Clear/Rename Fields-> Clear Fields+overwrite existing values.
f) Export the .bib library as a .csv library

3. Excel
a) import the *.csv library into excel. Delete all columns except for "Identifier" (=Bibtexkey), "Author", "Year" and "note".
b) assuming the fields remaining are in the order as above with one header row, enter the following formula in E2: =LEFT(B2;SEARCH(",";B2;1)-1. This should obtain for us the last name of the first author.
c) Then enter this formula into F2: =CONCATENATE("{";E2;", ";C2;" #";D2;"}"). This will give us the same format as an unformatted Endnote citation. also into G2: =CONCATENATE("{";E2;", ";C2;" #";D2;"@@author-year}") to get the endnote formatted author year style. Also in J, K and L =CONCATENATE("{";E2;", ";C2;" #";D2;";"), =CONCATENATE(E2;", ";C2;" #";D2;";") and =CONCATENATE(E2;", ";C2;" #";D2;"}") to get the cases where there are "mass citations" i.e. more authors in one set of brackets.
d) Enter into H2 the corresponding format for Latex that goes with F2 (i.e. gives author comma date in brackets): =CONCATENATE("\citep{";A2;"}") and into I2 the corresponding format for Author (date): =CONCATENATE("\citet{";A2;"}"). Now fill down cells E to I. Corresponding to the mass citation case enter in M, N and O: =CONCATENATE("\citep{";A2;","), =CONCATENATE(A2;",") and =CONCATENATE(A2;"}") respectively.
e) then take these cells and paste special (value) them to columns P-Y, make note of the number of rows. (Note: In case you notice any weird symbols e.g. ş gets translated as Yue or something of the sort, you need to search and replace these before doing step e) in order to be sure that no citations get left behind)

4. Word
a) Go to Tools-> Endnote X5 -> Convert to unformatted citations. Select the entire text you want to get your endnote citations replaced in (cmd+a).
b) run this applescript where the to value should get replaced by the number of rows from 3.e)
repeat with theIncrementValue from 1 to 345
    repeat with theIncrementValue from 1 to 345
      tell application "Microsoft Excel"
       set rg1 to "P" & theIncrementValue
       set rg2 to "Q" & theIncrementValue
           set rg3 to "T" & theIncrementValue
       set rg4 to "V" & theIncrementValue
         set rg5 to "U" & theIncrementValue
       set EndForm1 to value of range rg1 as string
       set EndForm2 to value of range rg2 as string
         set EndForm3 to value of range rg3 as string
       set EndForm4 to value of range rg4 as string
            set EndForm5 to value of range rg5 as string
   end tell
   tell application "Microsoft Excel"
            set rg6 to "R" & theIncrementValue
       set rg7 to "S" & theIncrementValue
            set rg8 to "W" & theIncrementValue
       set rg9 to "Y" & theIncrementValue
            set rg10 to "X" & theIncrementValue
            set TexForm1 to value of range rg6 as string
       set TexForm2 to value of range rg7 as string
         set TexForm3 to value of range rg8 as string
       set TexForm4 to value of range rg9 as string
            set TexForm5 to value of range rg10 as string
  end tell
   tell application "Microsoft Word"
       set findRange to find object of selection
            tell findRange
                  execute find find text EndForm1 replace with TexForm1 replace replace all
                  execute find find text EndForm2 replace with TexForm2 replace replace all
                  execute find find text EndForm3 replace with TexForm3 replace replace all
                  execute find find text EndForm4 replace with TexForm4 replace replace all
                  execute find find text EndForm5 replace with TexForm5 replace replace all
            end tell
   end tell
end repeat

Please note: The above procedure does not account for names that are of agencies (European Banking Authority etc.) however the vast majority of quotations should be easily taken care of in this way.) Feel free to improve on the approach as desired.

Applying these steps worked well for me at the time of writing. However I can give nor warranty explicit or implied that the approach is fault free. The only application took place on MacOS X 10.6.8. Always back up important files before manipulating them!


 

No comments:

Post a Comment