File Formats

CSV

This is one of the oldest and most common file formats, it does have some serious issues but it's flexibility and cross platform compatibility make it a frequent lowest common denominator.

Recently I have been using CSV files with Microsoft Excel and a couple of noteworthy points came to light. First Microsoft Excel cannot cope with numbers longer than 15 digits. Oh and it does not warn you about this, it will just take the first 15 digits starting from the left and drop the rest. Secondly it is hard to get ID numbers with leading zeros into Excel as text, this is true with mobile phone numbers for example but also IDs like 0001, which becomes 1. There are two options, you can write ="01" into your CSV file or "="01"", which whilst looking odd does actually work.

JSON

The JSON (pronounced like the name Jason) format came about because XML was just too verbose. The specification for JSON schema is available at JSON Schema | The home of JSON Schema as well as links to various implementations of validators and generators. There is a nice online editor at JSON Editor Online - view, edit and format JSON online. However I have also found ObjGen - Live JSON Generator useful and the output of that can be put into JSON Schema Tool to get a proper schema.

Markdown

Files with the extension ".MD" which contain Markdown are becoming increasingly common. GitHub uses them as does Stash from Atlassian. However GitHub openly admit they use their own format of Markdown. Here are some handy hints:

  • New Line: this is achieved by typing space, space, enter, in other words the line needs to end with two space characters
  • New Paragraph: you need two new lines for this, or you separate a paragraph with a blank line
  • Bullet List: start the line with "- ", "+ " or "* ", any will do
  • Numbered List: I would recommend starting all the items with "1. " that way they all come out with their own unique number and reordering is easy
  • Tables: the key is to separate cells with | which is "space, vertical bar, space" and you can add leading and trailing bars as well as do things with headers, see Markdown Cheatsheet · adam-p/markdown-here Wiki but note I have not had success with the heading separator on Confluence
There is some good documentation at Daring Fireball: Markdown which is the "original" Markdown, however there are several variants and it is not always clear which is in use. Use Mastering Markdown · GitHub Guides for "GitHub Flavoured Markdown" or GFM. Atlassian use CommonMark. I believe the community is moving towards CommonMark as the accepted standard, for example GitHub are moving that way.

If you visit Babelmark 2 - Compare markdown implementations and try your Markdown, you will see how different engines parse things slightly differently, but this is a helpful resource.

PDF

This is the Portable Document Format, originally developed by Adobe: Creative, marketing and document management solutions but now an ISO standard. If you want a document readable on many different devices then it is a good choice. I generally use PDF reader, PDF viewer | Adobe Acrobat Reader DC where possible, I also use PDFCreator – free pdf converter, create & merge PDF files but Microsoft Office provides good support for PDF as does LIbreOffice. I have also looked at Split and merge PDF files. Free and open source - PDFsam.

There are many libraries for processing PDF files but Apache PDFBox | A Java PDF Library is one such library.

YAML

These files are starting to occur with increasing frequency. So browser Web Extensions use YAML, as does Drupal 8 and other things like Kubernetes. It is worth a quick look at Learn YAML through a personal example for an introduction.