Tuesday, July 10, 2007

Free your data: use HTML tables

Recently I've concluded that spreadsheets are not an optimal way to manage tabular data.

That's quite a claim - Excel earns billions for Microsoft every year, and the recent upsurge for OpenOffice is about standards and open source, not about form and functionality. Even Google has faithfully represented the spreadsheet as an online application.

So, let's review the problems with spreadsheets:

  1. Page layout, text, and multimedia. You can't simply relegate this to MS Word - people want to annotate, explain and display their data professionally. Spreadsheets are awful at this.
  2. WYSIWYG. Spreadsheets display endless rows and columns, no matter the amount of data.
  3. Semantic. You can't distinguish headers, footers, or captions, except through styling. That prevents spreadsheets from being properly computer readable.
  4. Storing data. it's hidden in a binary or zip file, among heaps of formatting, configuration data, etc.
  5. Linking to external data especially on the internet - if you can only analyse your own data, you're missing a lot out - e.g. mashups.
  6. Publishing data If you give someone a spreadsheet, they can edit all the cells - unless you rely on hopelessly insecure password protection!
  7. Collaboration - online discussions, versioning, and synchronous editing

The first three issues are fundamental and inherent with any spreadsheet. The final three are inherent with client-based spreadsheets, but could be partially solved using online tools like Google Spreadsheets.

Using web standards for tabular data

Let's take a step back and categorize everything that spreadsheets do with tabular data, and whether there are any internet technologies with equivalent functionality:

Spreadsheet capabilities, Web technologies
CapabilityDescriptionWeb technology
Store Visually in cells, semantically in data format HTML tables
Transform Sort, group, filter, pivot, consolidate, chart DOM / XSL
Style Borders, shading, text formatting CSS
Model Functions, calculated values, goal seek Javascript, XPath

All these use cases - with the possible exception of conditional formatting, where CSS falls short - can be simply achieved using HTML, CSS, XSL, and a bit of javascript. The browser can beat both MS Excel and OpenOffice - we can free our data.

So what does it look like?

The solution is an editable HTML table inside a web page. The table contains all the spreadsheet functionality you need - sorting, grouping, functions, etc - but rather than taking up the whole page, it's just part of the page, and only contains the amount of cells you need. This allows analysts to surround their data with website text, images, or video (solving problem 1).

Only the appropriate number of rows and columns are displayed in the table - if you want more, you can add them (solving problem 2). This makes the page much more natural and avoids existing problems with people getting lost at the 64,000th row.

HTML tables are the best semantic way to store tabular data (solving problem 3), since there are a range of elements - from rows, to columns, to header and footers and captions - to label the contents. And because it's a web page, all sorts of collaboration, linking, and publishing techniques are immediately available (solving problems 4-7).

New use cases

There are plenty of new opportunities once you use the web to manage tabular data. None are particularly feasible using spreadsheets:

  • mashups
  • new widgets, e.g. maps
  • embedded microformats (e.g. addresses, calendars, etc)
  • version management (rather than endless versions on corporate C: drives)
  • publishing and read-only tables
  • extensibility - new functions & transformations
  • using the web as a database, e.g. DabbleDB

Web Data Management

I don't think this even requires a new online application. You can easily imagine it being part of a blogging website - when you insert a table into your blog entry, the spreadsheet functionality immediately becomes available.

It's just a matter of imagination and time before this happens - and personally, I can't wait!

No comments: