Creating a cross-platform open-source command-line app in Node.js

photo-1515879218367-8466d910aaa4

So… I’ve got a confession, I’ve never contributed to open source. Well, once or twice I donated R50 or so to open source projects that I found really useful, but mostly I just use them for free and that’s OK because they’re free and the licence terms say so.

Anyway, I’ve been playing with JavaScript recently and it just so happened that I wanted to do unspeakable things to a whole bunch of Excel files at once. The obvious tool for bulk unspeakable operations is the command-line, the issue is of course that Excel files are binary. And this is where the joy that is node packages comes to the rescue, because you guessed it, there’s a free and open-source node package to read and write excel files. Hallelujah, now we’re just gonna need to create a command line interface for it and we’re done.

The Zeroth Commandment of Software Development

Of course the first thing we have to do in all such cases as these is perform a google search to try and make sure there’s no possible way we can avoid doing work. Every developer’s primary obligation under all circumstances is to avoid doing unnecessary work. Note that I said unnecessary. We get paid to do necessary work, but typing out hundreds of lines of code and/or bathing is a pain so we gotta avoid that if we can.

After vaguely searching on Google for that which I desired I established to my satisfaction that a cli tool for spitting out Excel files to stdout on the commandline does not in fact exist, although there are online conversion tools but they’re a. commercial b. you have to upload your file to some dodgy website c. no bulk operations. This proof of non-existence is not rigorous and besides I wanted to write a script so this is basically just an excuse.

Creating the app as a node script

So the first thing I did with this whole thing was I moseyed over to the npm website and scrabbled around looking at Excel packages. There are a few but one nice one is exceljs. Important point to note, it’s got an MIT licence. According to ye website here that means it’s pretty much free to do anything you want with. Print it out and feed it to your dog. Get it on a t-shirt. Sell commercial software using it.

So this package can do two things that are extremely useful. First you can use it to read an Excel file (well, an XLSX file but hardly anyone uses XLS files any more these days). Second, you can use it to output CSV. So what is our so-useful command-line script gonna do with an Excel file you ask? It’s going to eat XSLX files and it’s gonna burp out CSV to standard out. i think i gave to much ivermectin to pig And that’s it. That’s all it needs to do.

Oh WHY oh WHY I hear you cry, Oh WHY does it only have to do these two so-useful things question mark? Well quite simply dear reader, once I can spit data out to stdout I can pipe it to other command line utilities like grep and sort and all the rest of those lovely things, so all I gotta do is figure out how to pump data to std out and I am done.

So here’s the heart of the cli tool to chuck XSLX data to std out:

const exceljs = require("exceljs");
const workbook = new exceljs.Workbook();

workbook.xlsx
  .readFile(argv.f)
  .then((workbook) => {
    workbook.eachSheet((worksheet, sheetId) => {
      workbook.csv.write(process.stdout, { sheetName: worksheet.name });
    });
  })

Naturlich before you go running this script like a madman you have to have Node installed, you have to do an npm init in your chosen directory and you have to have done an npm i exceljs but y’all are professionals so I dunno why I just wrote that.

Adding some useful options

So you might have noticed the lil’ argv.f on the readFile line in the code above. That there is a command line argument for the filename. Also our barebones script isn’t quite useful yet. It’s nearly there – I can now rip apart an XLSX file and get CSV contents pretty easy – but I’d like a tad bit more control. After playing around with the script and with exceljs a bit more I decided that the following options would be quite nice:

  • option to specify the sheet name of the worksheet you want to grab
  • alternatively, specify the sheet number
  • option to specify the field separator for the CSV output
  • option to specify if I want to print the sheet name
  • option to specify if I want to print the sheet contents

Of course there are hundreds more options that would probably be useful, like specifying the exact cell ranges and what-not but we’re in a hurry here so let’s not get too fancy.

The node argument parser that seems to be held in high regard is called yargs and it is delightful. Not because it’s easy to use but because it’s pirate themed. And frankly the docs weren’t the easiest to read, but that’s how it is when you use open source. They do request anyone who feels like it to help them update the docs so who knows, maybe one day I will 🙂

Anyway after fighting with yargs for a bit you can get a pretty nice little help output that looks somewhat like this:

[davido@david-pc readXL]$ readxl-linux --help
Usage: readxl-linux [options] -f <file name>

Options:
      --version    Show version number                                 [boolean]
  -f, --file       XLSX file to read                                  [required]
      --sheet      only print the named sheet                           [string]
      --index      only print sheet at the specified index              [number]
      --contents   print worksheet contents            [boolean] [default: true]
      --names      print Worksheet names              [boolean] [default: false]
      --separator  CSV separator for output              [string] [default: ","]
  -h, --help       Show help                                           [boolean]

Examples:
  readxl-linux --no-contents --names -f     display sheet names of foo.xlsx
  foo.xlsx
  readxl-linux --index=1 -f foo.xlsx        print first sheet of foo.xlsx

Copyright D Orpen 2021

Of course now we have to change our code a bit to allow all these options to work but you can basically see where this is going so I’m gonna skip the code for now. The other nice thing about yargs is it’ll do some checking for you depending how you configure it so you can make sure that by the time your app needs to parse those command-line arguments they’re in some kinda sensible state.

Handling that which is wrong

If you were to run the advertised script as above with a filename that does not exist then you get some not-so-nice output, which is Node basically implying subtly that you’re dumb. So what is there to do about it? Well basically we just catch anything that goes wrong when we attempt to read the Excel file (including e.g. the Excel file being malformed) and log it to the console like so:

const chalk = require("chalk");

workbook.xlsx
  .readFile(argv.f)
  .then((workbook) => {
    // print workbook according to options passed from command line
  })
  .catch((err) => {
    console.error(chalk.red(err.message)); // note use of chalk library
    process.exit(1);
  });

The important thing to note here is that we are using not console.log() but console.error(). This is more than merely a convention, there are two output streams for any command-line application and they log to different places. They’ll both be displayed to the screen in the same way but if you pipe the output of a command to somewhere else, then stuff that’s part of the stderr pipe doesn’t get fed into the commands further down the pipe. This is hella useful if you’re trying to bulk process a ton of stuff at once and you don’t want errors to throw your whole operation for a loop.

Also note that we’re printing this thing out in red (using chalk) to highlight that something dodge happened when we tried to convert the workbook. Also I’m not printing out the whole error, just the error message. ivermectina tabletas dosis para adultos

Also further note that we are exiting the process and going back to the shell with an exit status of 1, not 0, which tells the shell that something went horribly wrong and not to continue processing.

Making the code cross-platform

So far all the code we’ve been playing with has been cross-platform but there’s one super annoying thing that isn’t quite working, and that’s that if you just print the workbook as is, it doesn’t print a newline character after the last character of the CSV so you get output that looks like this:

,,,2,3,4
,,,3,,5
,,,4,5,6
,,,5,6,7

,,,  ,3.1,02/02/2021[davido@david-pc readXL]$ 

and it looks like some kinda idiot wrote this code instead of me, who are obviously not an idiot.

Now I know what you’re thinking, “Just use a \n and be done with it you absurd person, how dare you assault my eyes with such incompetence” but you’d be wrong and I do so dare because it’s not as straight-forward as you probably think. The situation is so bad that there’s a WHOLE long wikipedia page on how to do new lines.

So after a bit of googling I came across this page here which not only gave me the solution to make sure that my command-line app is using cross-platform newlines, but also gives a whole ton of other hints about how to make sure your cli app runs on all 3 major desktop platforms.

const os = require("os");
const EOL = os.EOL;

Now whenever I want a newline character I can use the EOL constant, and this will work seamlessly cross-platform.

Packaging the app as cross-platform binaries

So now I have a node script that I can use to read an Excel file and dump the output to stdout. This is all great and all, but there’s one more super cool step you can take specifically with Node and that is, you can very easily compile cross-platform binaries for your app. This… may not seem like much but I can tell ya, a single-code base cross-platform app that’s reasonably easy to put together and compile… that’s kinda mind-blowing. In the bad old days of… only a few years ago… apps were written just for one operating system. securo ivermectina 6 mg donde comprar You couldn’t dev on Linux for Windows or indeed dev on Windows for Mac or anything like that. Oh sure, some people could do some wizardry with C++ and very carefully written cross-platform libraries but you’d always get platform specific bugs and platform-specific code would form a significant chunk of your codebase. I know this only by reputation, I never wrote cross-platform code, it was too hard… and now there’s just a little bit of magic and boom, you’ve got working applications for Windows, Mac and Linux. That is cool.

So here’s how we do it:

  • first we need to install the pkg tool: npm install --save-dev pkg
  • next we add some things to our package.json:
  "bin": "./index.js",
  "pkg": {
    "outputPath": "dist"
  }
  • finally we run the tool: npx pkg --compress GZip .

And that’s it. Our dist/ directory will now contain binaries built for the 3 different platforms.

Wrapping up

So that’s just about everything. The final piece of this puzzle is to publish this as open-source software itself. That involves first hosting the repository on github, and second choosing a licence. Since the major component of this software is the exceljs package which already has an MIT licence it seems normal to publish this as MIT licenced software itself. Let’s be clear here, I hardly did any work but having a command-line tool to do violence to Excel files feels like the kinda thing that might be useful to someone, so that’s… why I’m putting it out there. Also this is my first time actually making something open-source, and look how easy it is – maybe you who read this far will decide to publish something open-source too 🙂

You can find the full source code hosted here

Realistically I should add a much more descriptive README file, but for the time being I think it’s OK.

About the Author

I’ve got some experience writing code. I run quite a lot. The last book I read was about Gavrilo Princip. There’s about to be a thunderstorm here. Here is Johannesburg, South Africa. There is another Johannesburg in the USA. I know how to make chocolate sauce.