Wednesday, February 28, 2007

Electronic money and new User Interfaces

2007 is shaping up as the year of the new user interface. First we had the Wii, then the iPhone, and now the Economist is briefing us on the end of cash – apparently we will soon be swiping our mobile phones against retail terminals to pay for things.

The Economist pictures consumers swiping their phone, checking the details of their purchase on its screen, and finally clicking ‘ok’ and typing their pin number on their phone keyboard. They could also use their phone to check their bank balance.

Swiping a device is a wonderful user interface metaphor. It’s a physical way of requesting something – pointing to it and saying “I want this”.

Done via the web, the object you’re purchasing is represented at a URL. And swiping your phone is requesting that URL in your phone’s browser. This brings up a web page describing the object.

To achieve this in today’s browsers, you have to type in a URL, or click on a hyperlink. I see swiping as another method of doing this, which avoids fiddly typing – you a requesting a specific web page by reaching your hand out to the resource it represents.

Of course, on the web page is a form for confirming the details and authorizing the purchase. This is easily achieved using simple HTML.

What I’m proposing is that the technology of Near Field Communications (NFC) is based on the internet. The data is represented in HTML and transported via HTTP.

There are several benefits to this approach:

  • Modern phones already have browsers, and HTML allows the use of images, styling and sophisticated user interface elements that consumers are already used to
  • Retail groups already have web infrastructure in place advertising their products and services
  • HTML allows tailored web forms, for confirming details and entering authorization.
  • Web security measures such as SSL and secret password fields can be reused.
Enhanced security

As it stands above, the security model is the same as used on the internet today. But there are ways to make it even stronger.

Because the mobile phone is uniquely personal, it can be used to provide extra authentication of the user. Many people have suggested using the phone in two-factor authentication. Usually, this is where you type in a pin number, the phone uses it to generate another number, and this second number is submitted. The phone updates its algorithms every minute, so the second number will only work for a minute – then it is useless.

I would suggest a new HTML Form element – the “pin” input element – where the browser should carry out this algorithm automatically based on whatever is typed in, so that the second number is submitted.

That way, the “swipe” becomes even more secure than normal internet transactions.

Business Model

Will the bank or the phone company handle these transactions?

In line with previous posts, I think it’s very difficult to see the phone company as anything other than the bit pipe along which data transmission occurs.

That’s because phone companies don’t have the expertise to handle financial risk, or the willingness to be regulated as banks.

However, I can certainly see phone companies making a tidy sum by creating exclusive deals with banks to handle the transactions, or at least by providing a default bank. In some cases this could be a white-labelling service, with a bank handling the underlying transactions under the banner of a phone company.

I can also imagine my phone offering me choices – for that sofa, I could choose to use a selection of credit cards, rather than my usual bank account.

And there are opportunities for retailers to learn more about their customers. Rather than having to issue “club cards”, why shouldn’t they simply associate membership with your phone? Whenever you make a purchase, this gives them more information about you, and allows them to supply tailored adverts and special offers to your text message inbox.

The power of the swipe

What could be more intuitive and simple than swiping at something to request it? By combining swipe technology with the internet, payment systems will be revolutionised, providing many benefits to consumers and retailers.

Web Office Suites

After a decade of inactivity in the office suite industry - other than Microsoft raking in billions - suddenly there is lots of news. Now we need a brave soul to question the underlying applications themselves - the word processor, spreadsheet, and presentation.

I've already posted on the rise of OpenOffice, its standard XML Format, and Google Apps. Each of these has a momentous effect on the industry - I predict that 80% of people will find browser-based suites provide all they need. Desktop suites will be consigned to power users.

But what hasn't changed is the basic package - word processor, spreadsheet, and presentation software.

I think it's time for another look at this triumvirate. User interfaces have moved on massively since then, and so has the underlying technology; HTML, CSS and javascript are great general purpose tools. The internet is a distribution model that supports new ideas and niche applications.

So why not start from the beginning? What do people use office applications for? Does it truly split into three different use cases (spreadsheet, word processor, and presentation)?

I often see spreadsheets with paragraphs of text in them - which surely is best done in a word processor. And I've lost count of how many word documents or presentations with embedded tables of data I've seen.

So why not combine them all into one application? Imagine a WYSIWYG web page editor, where you could drag and drop text and shapes around the page. And further, imagine you could add tables to the page - each having full spreadsheet power (functions, financial / date-time formatting, sort, filter, etc). Just like spreadsheets, each table cell can depend on any HTML element on the page (or any other URL) for its value.

It isn't too tricky to combine applications if you leave out the bloat - adding power user features is the main approach Microsoft has taken to persuade us to upgrade.

And by combining applications into one, users are freed from having to make the choice of format (is this a Word Document or Powerpoint deck) that often seems arbitary at the beginning.

Microsoft themselves often hinted, during the 90s, that the future lay in integrating office applications. In the end, they settled with clunky COM components, inserting for example whole Excel spreadsheets inside Word documents, which is visually a mess.

I've played round with a few concepts and reckon that it's easily possible to write a combined suite using simply HTML, CSS and javascript (with VML / SVG to support drawing).

So come on, does anyone fancy helping me create a concept website?

Monday, February 19, 2007

HTML menus

Menus - like the file menu at the top of every application - have always been tricky to code in HTML. They involved reams of javascript and endless workarounds for the deficiencies in each browser.

The HTML 5 working group are talking about a new HTML tag to enable menus. This would be great, but for now, why not just use the ordered list tag <ol>? After all, that's what menus are - ordered lists.

So it's a great relief to see menus done properly - with no javascript in sight, just the <ol> tag and some CSS.

But before we congratulate ourselves on proving the power of HTML yet again, it's worth asking what menus are for in the first place.

I count three uses:

  1. Site navigation hyperlinks (e.g. the left hand pane of http://www.microsoft.com/sql/default.mspx)
  2. Standard application menus (e.g. the MS Word file menu)
  3. Context-sensitive application menus (e.g. right mouse button options)

On the web, most people just think of the first use, because web pages still aren't seen as applications in their own right. For those using web-based spreadsheets, however, uses 2 and 3 are more important - rather than navigating to different pages, the menu options manipulate the existing page.

Most desktop-based applications are just as poor at menus as web-based ones. Even commonly used ones - such as Internet Explorer itself - do a bad job here. There is a very complicated File-Edit-View-Favorites-Tools-Help, there are the standard buttons (back / forward / refresh), there is the address bar plus optional extra bars, and only then do you get the page itself, which will often have its own menus too.

So Microsoft is due some praise for recognizing this, and innovating with the new Office 2007 ribbon, which combines uses 2 and 3. Maybe they did it to stay ahead of websites like Google Docs; if so, the effect was diminished by successfully mimicking it in the MS Office website!

How is a ribbon menu best achieved on the web? Well, in theory, just using additional <ol> tags and CSS. In practice, this is a nightmare without a properly CSS-compliant browser; even Internet Explorer 7 falls somewhat short. But never underestimate the resourcefulness of web developers - there's plenty of innovation to come using existing tools. I wouldn't be surprised if ribbons started appearing on websites very soon.

Context-sensitive menus are by far the rarest on the web. In Google spreadsheets, an HTML menu opens up when the right mouse button is clicked on a cell. To be frank, I'm not sure this is good practice - think of smartphones, PDAs, Tablet PCs, Apple Macs, and voice-activated browsers - none have a "right mouse button". And a huge proportion of users never think of using the right mouse button (see Jon Udell's discussion on saving web pages) - instead, context-sensitive menus should probably appear as part of a ribbon.

In fact, it's pretty easy to code context sensitive menus using simple javascript. For example, you could dynamically swap out the contents of a menu <ol> tag based on browser focus and DOM events.

So I predict a migration to menus based on <ol> and CSS.

Continued user interface innovation will make menus as friendly and accessible as possible. Menus inspired by the Office 2007 ribbon will become more popular for true web applications, like Hotmail or Google Spreadsheets.

And context-sensitive menus will appear through the web, as developers realise the power of simple HTML, CSS and javascript.

Touch-screen displays on the web

Recently I saw a fabulous demo of touch-screen displays in action.

In the demo, the user is shown manipulating shapes on the screen with both hands - squeezing images, grabbing multiple shapes simultaneously and pushing them together, and simultaneous drag-and drop.

This reminded me of Steve Job's iPhone demo, and my comments at the time that HTML can probably handle multi-touch UIs, but javascript might struggle.

Experience tell us that developers need several different methods to handle user interaction. There should be a simple method with default behaviours, and a more detailed method giving fine control; and there should be a declarative approach for XML developers, and a procedural approach for those that prefer scripting.

It's clear that several things will have to change before websites cater appropriately for touch screen displays.

Firstly, we need more support in CSS for simple effects like drag and drop (our simple, declarative method).
Secondly, we need more declarative support for animation (fine-grained, declarative method).
Thirdly, if there is no mouse, there is no right mouse button - so we'll need to rethink the approach for context sensitive menus. Microsoft have innovated here with the new 'ribbon' interface in Office 2007 - I'll save this piece for another post.

CSS user interaction

CSS styles fit the bill perfectly for a simple, declarative approach. If we add a series of user interface CSS styles, the user gets a consistent experience, and the developer doesn't have to worry about endless code:

  • draggable = "no | yes" - elements with this style can be moved across the page via user interaction.
  • resizable ="none | x | y | preserveAspectRatio | all" - elements with this style can be re-sized via user interaction, along either or both axes.
  • zoomable = "no | yes" - elements with this style are containers (e.g. <html> or <div> tags) and zooming commands are available on the contents of the container.
  • pannable = "no | x | y | all" - elements with this style are containers (e.g. <html> or <div> tags) and panning commands are available on the contents of the container (e.g. panning around Google Maps). This could be scrollbars, or some other user interface method, depending on the browser.

For each of these styles, the exact user interaction method doesn't matter to the web developer - it could be a mouse, a touch screen, voice commands, or something else, as set by the browser or the operating system. In some cases (e.g. touch screen) there could be multiple user interactions at the same time; that's all handled by the browser. All the web developer need care about is setting the appropriate styles.

Declarative animation

Anyone who's tried to program drag and drop knows that the DOM is painfully awkward at tracking certain user interactions - but imagine dragging two objects on a touch screen simultaneously! Which event object would you use?

The real pain here is for events like mousemove. These are "continuous events", a contradiction in terms which reveals the flaw in the underlying approach. For continuously evolving features, languages should use Functional Animation instead (see my previous post).

Imagine if the browser maintained user interaction state (mouse position, touch screen location, etc) in a read-only XML file directly accessible to developers. For example:

<pointers>
<pointer status="active" screenX="100" screenY="100" elementref="div0" relativeX="5" relativeY="5"/>
</pointers>

For the mouse, there would only be one <pointer/>, with "active" status when the mouse was down, and "inactive" when up. For touch screens, there would any number of <pointer/> elements (including zero), each representing a finger or stylus touching the screen. The elementref attribute stores a reference to the element that the pointer is currently over, the relativeX and relativeY commands store the location relative to this element, and the screenX and screenY elements store the location relative to the screen.

Once you have this file, you can do functional animation based on it. For example, using the XForms <bind> tag:

<bind infoset="id('img1')" calculatewhen="//pointer[@elementref = 'img1' && @status='active']">
  ./@css:left = "//pointer[@elementref = 'img1']/@screenX;
  ./@css:top = "//pointer[@elementref = 'img1']/@screenY;
</bind>

which once activated, binds img1 to the evolving location of the mouse / stylus.

As you can see, this approach avoids the need to use javascript at all - event handlers and declarative functional animation are enough.

Touch screens are the future

Computer mice have been around for so long that it's tempting to see them as a permanent fixture in computing. But actually they're pretty uninuitive - remember seeing someone using a mouse for the first time?

As touch screens spread, web developers will be faced with an interesting set of challenges, which are best overcome using a few simple CSS tags, and a declarative approach.

Thursday, February 15, 2007

My Proposal: Functional Animation

In this blog entry I'm going to explain some of the detail behind my proposal for functional animation, which was introduced in a previous entry.

The goals are to allow any XML document node or CSS stylesheet property to:

  1. Evolve as an explicit function of time and other (possibly also evolving) nodes or CSS values f(t, a, b, c, …)
  2. Apply the function conditionally, or depending on events such as key presses
  3. For continuously varying values, set or reference their speed and acceleration

Goal 1 says that the language should be functional. Rather than calculating an incremental change every millisecond, as per javascript, the property evolves according to a function. For example, it might be "width = 2 * height" which would maintain the width at twice the height, no matter how the height evolves.

This requires us to incorporate a referencing language, so that each element can refer to the others. There are lots of them around – there’s one in CSS and another in SMIL – but there is a more powerful standard, XPath, to take advantage of. The other benefit of XPath is that it introduces standard mathematical and string functions. For example, why not specify "width = avg(//img[@class='ball']/@width)", which sets the width to the average of every image width with 'ball' class.

It also implies that the language works like a spreadsheet - any time a value changes, the effects can ripple through all the dependent values. Like Microsoft Excel, the system needs a dependency engine in order to quickly figure this out and work all the way down the dependency chain (looking out for circular references).

Finally, XPath allows us to work with t, the time. Unfortunately there’s no pre-defined time variable that starts from zero when the animation begins – but we can create our own, by using the XPath system clock function and measuring from when the animation began.

Goal 2 demands that our function language has some conditional statements in it, possibly dependent on node values. For example, perhaps an SVG image is programmed to be repelled by another image, but only if they get too close. Or perhaps it speeds up when the mouse clicks (which requires integration with the events model in HTML/XML).

There are two existing XML animation technologies that go some way to meeting these two goals. The first is Synchronized Multimedia Integration Language (SMIL), which despite its name incorporates a general-purpose model for animating any XML document. It is very straightforward, and works well for basic animations.

Unfortunately, for more complex animations SMIL has severe limitations, which stem from the fact that it doesn’t meet Goal 1 fully. The most basic limitation is that SMIL animations follow a pre-defined path. SMIL doesn’t handle situations where the desired path is dependent on unpredictable evolving conditions – for example, the mouse position, or the location of other moving objects or even a random number generator. SMIL also doesn’t incorporate XPath, which means it’s difficult to reference the values of other nodes.

The second existing XML animation technology is XForms. It may seem surprising that a web forms technology incorporates sophisticated animation functionality, but it does - because it includes a functional binding language based on XPath, which meets Goal 1. All it needs is a few simple extensions.

For example, consider the following XForms line:

<bind calculate="2" infoset="//img/@css:width">
which takes every image tag in the document, and sets the width to equal twice the height (using the XPath statement). Now, any time the height of any image is altered, its width will automatically reset to be double.

This might not be sophisticated animation, but it’s not possible with SMIL, and let’s take things a step further:

<script >document.getElementById(‘divTime’).setAttribute(‘timeStarted’, now())</script >
<bind infoset="id(‘divTime’)">
./@timeElapsed = current-dateTime() - //divTime/@timeStarted;
</bind>
<bind infoset="//img">
./@css:top = 200;
./@css:left = 100 + 100 * sin(id(‘divTime’)/@timeElapsed);
</bind>

First I have used javascript to set the timeStarted attribute to the system time when the page loads. Next I have extended XForms so the contents of the <bind> element works just like a series of calculate attributes.

The first <bind> element sets up a counter – the timeElapsed attribute – that holds the number of seconds since the page loaded.

The second <bind> element animates every image on the page from side to side according to a sine function.

You can see the immediate parallels between the <bind> element and CSS. It’s just the same, except it uses XPath as a referencing tool, and you can assign functions to each variable, not just static values. You could even put the <bind> tags in a separate stylesheet, just like CSS. Or you could get rid of your exising .css files, and replace them with the syntax above.

If the W3C followed this approach, it would pull the XForms <bind> element into a separate XML Functional Animation spec, which would form a foundation for CSS and supersede most of SMIL.

I’d like to give some more examples to show just how powerful this approach is. First, I’ll introduce three more new pieces

  • The calculatewhile attribute, which is an XPath boolean statement that controls whether the <bind> element should be run or ‘paused’
  • The new XPath function d_dt(), which sets or retrieves the rate of change (speed) of any node
  • The new XPath function d_dt2(), which sets or retrieves the acceleration of any node – the equivalent of d_dt(d_dt())
These pieces enable Goal 3 to be achieved:
<bind infoset="//img">
d_dt2(./@css:left) = - ./@css:left;
</bind>
which turns the images into simple harmonic oscillators (i.e. springs), vibrating backwards and forwards like a child on a swing.

Or consider:

<bind infoset="id(‘img1’)">
d_dt2(./@css:left) = id(‘img2’)./@css:left – id(‘img1’)./@css:left;
</bind>
<bind infoset="id(‘img2’)">
d_dt2(./@css:left) = id(‘img1’)./@css:left – id(‘img2’)./@css:left;
</bind>
which models two balls, joined by a spring.

Once you’ve thought about it, you realize that this approach to animation allows pretty much anything in classical physics to be modeled – wind resistance, friction, magnetism, gravity, etc. Which means it’s pretty useful in programming computer games! And there isn’t a tougher test for an animation language than this.

In summary, I think the XForms <bind> element contains, with a few simple extensions, everything the XML developer needs to produce world-class animations.

Monday, February 12, 2007

Animating the web: functional styles

From the first time I saw it, I’ve thought that the equals sign in programming languages is wrong. In maths, the statement x=5 says that no matter what happens, x will always equal five. In C++, or Java, or Visual Basic, it only sets x instantaneously to 5; x can still evolve over time.

This is important, because mathematical symbols (like the equals sign) are a result of centuries of learning about the best way to represent fundamental concepts. They have proved their worth time and time again in explaining the natural world. In fact, many revolutions in science have only taken place by the introduction of new symbols – Newton’s differentiation and integration symbols to explain dynamics, Einstein’s use of Reinemann equations in General Relativity, and Heisenberg’s use of matrices in Quantum Mechanics are cases in point - but the equals sign has remained constant.

So why do programming languages not follow standard mathematics?

Actually, there's one that does - spreadsheet formulas. It's probably why Excel formulas are the only language to spread out of the IT department. In Excel, if you set cell A2 to equal A1 + 5, then this will always be true – when A1 changes, A2 will update automatically to maintain the equality.

Excel handles the equals sign properly because it’s a functional programming language. Except for spreadsheets, functional languages are niche - the most common apart from Excel is probably Lisp, which was invented back in the 1960s.

Since functional programming expressions stay true even as the program evolves, they come into their own in animations.

Think back to spreadsheet functions, and imagine if you could write an HTML expression like this:

          <img id="img1" src="img1.jpg" width="10" />
          <img id="img2" src="img2.jpg" width="=2*img1.width" />
         
Note the extra equals sign, as per Excel, controlling img2 width. If img1 were resized, img2 would automatically resize too, in order to maintain its double width.

If you’ve ever tried to enable drag and drop in Javascript, you’ll know how awkward it is. With the functional approach, it’s just one line of code:

 <img id="img1" style="left:=mouse.left; top:=mouse.top" begin="img1.mousedown" end="img1.mouseup" />
This uses a simple condition to control when the animated style should work.

Once you see how this could work, you realise how cumbersome the procedural events model is in Javascript. The setInterval() function is unreliable (since you can't rely on processing speed) and inelegant - far better to think continuously, rather than triggering new events every millisecond.

I've called this approach "functional styles", because it's basically a functional programming extension to CSS styles. In this approach, any CSS style can be animated by assigning it a function (via the equals sign), rather than a direct value. And these functions have access to two continuously varying variables - the mouse position, and the time variable t.

Functional styles would open up animation on the web. Think Powerpoint animations, think Flash timelines, think interactive games, think interactive graphs and charts. In fact, think web spreadsheets! All currently require mountains of javascript and a very fast processing engine. Using functional styles, they would simply require CSS.

In a later post, I will go into the details of how this could technically work - there are very few elements other than basic CSS, XPath, and SMIL. It's possible to prove that the entire of classical physics can be incorporated into functional styles - gravity, friction and wind resistance, angular momentum, electric fields, and magnetism are all a matter of getting your CSS functions right.

For now, think how the simple equals sign, done properly, enables rich animations on the web.