OOXML: Join in the Bug hunt

Let’s join in the bug hunt [Updated: See below]

Rob Weir has produced another little gem of an analysis: to do a reasonably scientific search for errors and bugs in the OOXML specification. The idea is to see just how many errors were really caught in the original 5 month review process for the 6045 page specification, and how effective the BRM, held in Geneva a couple of weeks ago, was at fixing them.

His initial findings do not make very comforting reading…

… I’m not done with this study yet. I’m finding so many defects that recording them is slowing me down considerably. But since this is topical, I will list what I have found so far, based on the first 25 random pages, or 1/8th completion of my target 200. I’ve found 64 technical flaws. None of the 64 flaws were addressed by the BRM. Among the defects are some rather serious ones such as:

  • storage of plain text passwords in database connection strings
  • Undefined mappings between CSS and DrawingML
  • Errors in XML Schema definitions
  • Dependencies of proprietary Microsoft Internet Explorer features
  • Spreadsheet functions that break with non-Latin characters
  • Dependencies on Microsoft OLE method calls
  • Numerous undefined terms and features

… this doesn’t look good, does it? Not only am I finding numerous errors, these errors appear to be new ones, ones not detected by the NB 5-month review, and as such were not addressed in Geneva. Since I have not come across any error that actually was fixed at the BRM, the current estimate of the defect removal effectiveness of the Fast Track process is < 1/64 or 1.5%. That is the upper bounds. Of course, this value will need to be adjusted as my study continues. However, it is starting to look like the Fast Track review was very shallow and detected only a small percentage of the errors in the DIS.

If you fancy helping Rob and the rest of the free world, he lists the page numbers (chosen at random) from part 4 of DIS29500 that should be examined in detail for errors and such like. Here’s just a few of the page numbers (out of 200) to check:

… 1102, 1611, 3016, 2646, 3083, 5105, 747, 1142, 2596, 845, 626, 4047, 1415, 5143, 3997

The fact that in examining just the first few pages he finds numerous NEW errors indicates yet again, that this specification should never have been Fast Tracked in the first place and is just simply not in a fit state to be declared an ISO standard.

It really does make me wonder how certain National Bodies can be even remotely sincere when they decide to approve such a badly written specification.

[Update: In the few scant hours since Rob’s article was originally published, go and check the comments section! Even if you aren’t a software engineer, the errors and inconsistencies being reported are simply mind boggling. How on earth can this specification be approved as an international standard when it is just so bad?]

OOXML is REALLY BAD (But then we already knew that didn’t we?)

I know there has been a great deal of written commentary about why OOXML should or should not become an ISO standard. Just look back though this blog for many of them 😉

Recently however, Rob Weir has documented an incredibly simple, telling and fascinatingly descriptive demonstration of exactly why OOXML is such utter crap and should never become a standard in its current format. He shows how a simple formatting rule looks when saved using OOXML or ODF* based applications. See Rob’s table and comments below:

* ODF is the ISO approved standard for document formats and is used by many platform independent applications such as OpenOffice.org and KOffice.

… let’s take a look at how OOXML and ODF represent a staple of document formats: text color and alignment. I created six documents: word processor, spreadsheet and presentation graphics, in OOXML and ODF formats. In each case I entered one simple string “This is red text”. In each case I made the word “red” red, and right aligned the entire string. The following table shows the representation of this formatting instruction in OOXML and ODF, for each of the three application types:

Format Text Color Text Alignment
OOXML Text <w:color w:val=”FF0000″/> <w:jc w:val=”right”/>
OOXML Sheet <color rgb=”FFFF0000″/> <alignment horizontal=”right”/>
OOXML Presentation <a:srgbClr val=”FF0000″/> <a:pPr algn=”r”/>
ODF Text <style:text-properties fo:color=”#FF0000″/> <style:paragraph-properties fo:text-align=”end” />
ODF Sheet <style:text-properties fo:color=”#FF0000″/> <style:paragraph-properties fo:text-align=”end”/>
ODF Presentation <style:text-properties fo:color=”#FF0000″/> <style:paragraph-properties fo:text-align=”end”/>

The results speak for themselves.

What is the engineering justification for this horror? I have no doubt that this accurately reflects the internals of Microsoft Office, and shows how these three applications have been developed by three different, isolated teams. But is this a suitable foundation for an International Standard? Does this represent a reasonable engineering judgment? ODF uses the W3C’s XSL-FO vocabulary for text styling, and uses this vocabulary consistently. OOXML’s representation, on the other hand, appear incompatible with any deliberate design methodology.

I fear that before we can tackle harmonization of ODF and OOXML, we will first need to harmonize OOXML with itself!

I love how convoluted and completely indecipherable the OOXML representations are; and they are all different depending on which application you use! It’s laughable really.

Honestly, how on earth can the US NB, for example, have just announced their decision to vote yes to OOXML unless they have been thoroughly corrupted – as have so many other of the ISO National bodies and sub-committees involved in this whole sorry saga.

To my mind there will be two losers if OOXML becomes IS-29500:

  • Us – That’s all of us as consumers and users of electronic documents
  • ISO – They have already lost a great deal of respect and credibility. If OOXML passes they will have none left. They will become an irrelevance in technology standards at least.

I can see the IETF (The body responsible for much of what has made the Internet work) becoming a far more important standards setter going forward…

On the OOXML BRM (#2)

There is a new report on the events of last week’s BRM in Geneva from one of the delegates. It is articulate and, I can only assume by the language and tone, highly accurate too.

Rob Weir’s erudite description of the proceedings, and the results of having to process a stupid number of issues about a badly written specification, in 5 working days absolutely beggars belief. Please go and read it in full.

We are in the 21st Century. Surely we have better brains and know-how than this?

… As the meeting progressed into Thursday, the tension mounted. As new issues were identified, they were taken off-line and told they could be brought up “Friday morning”. But no one really believed that. It was clear that there was not enough “Friday morning” to go around.

Thursday 9:20am, a delegation objects that they were told only to review Ecma’s responses to their own comments, and that there was never sufficient time to review all 1,000 Ecma responses since January 14th. ITTF’s response: “Nothing we can do about it in the rules — Nothing we could have done in our judgement”.

2:18pm the Convenor announces “This is zero hour”.

There is clearly not even enough time to fully discuss in the meeting the resolution of items that were taken off-line for further discussion. The US is not allowed to present our multi-part proposals to the meeting. We are told get consensus outside of the meeting first, so it can be brought up for quick approval.

Into Friday the BRM spirals further downwards. The issue is not now that NB’s cannot raise new issues. The problem is now that NB’s who have been diligently working on issues off-line with other delegations, meeting over lunch, or early in the morning or into the evening, may not be able to have their proposals heard and acted on.

There simply is not enough time. The anxiety-driven, frantic delegates push even harder. More resolutions are approved with 2 or 3 delegations trying to raise objections, but without being recognized. Tempers grow short. One highly respected Head of Delegation, of unimpeachable reputation and experience started to voice an objection “I am extremely disgusted by the way procedures have been…” before being called out of order by the Convenor, saying that discussion of procedural issues will not be allowed. Another delegation tries to raise a new issue, as they had for the last two days without luck. “We’re using the public money from NNN to come here to speak on our issue. Can we speak on our issue?” Convenor – “We have run out of time.”

And so the BRM came to an end, with the announcement of the results of the paper ballot. Four delegations gave blanket approval to every Ecma comment (Cote D’Ivoire, Czech Republic, Finland, Norway) and three gave default disapproval positions on all undiscussed Ecma responses (India, Malaysia, United States). Most delegations gave a default abstain position, or registered no position. The net is that, although the discussions on Monday and Tuesday demonstrated that the quality of the Ecma responses was such that almost every one required substantial off-line work to make it acceptable, we gradually lowered our standards, so that by week’s end, we approved 800+ comments without any discussion, even in the presence of clear objections.

I want to make it clear that I in no way wish to criticize the Convenor. I think Alex did a remarkable job in trying to carry out his duties and be fair in this no-win situation. He was given an impossible task and had to find out how to fail in the least offensive way. There is an art to crash-landing an airplane and we must acknowledge that.

Here’s what I think should happen:

  • The ISO/IEC should be ashamed. They should apologise and instigate an immediate review of procedures and suspend all further activity until they stop this kind of fiasco from happening again.
  • Microsoft should be hung-drawn and quartered. Perhaps the EU will see to that in due course.
  • ECMA should be banned from submitting ANYTHING to ISO/IEC ever again.
  • ECMA-376 (DIS29500, OOXML) should be thrown out today.
  • Microsoft/ECMA should refund all delegations their expenses for this total farce.

On the OOXML BRM

Well, that’s it. The BRM is over.

We are starting to get some reports about what happened and what sort of outcome is to be expected.

Let’s start with Brian Jones of Microsoft. Here’s his take on what when on.

Well, the BRM is over and I can only describe the week as a lot of technical work and a lot of great people I was lucky enough to meet and exchange ideas with. The objective of the BRM was to work with all of the National Body delegations in the room and improve the specification on a technical level — and that we did. There were many technical changes the delegates made to really get consensus on some of the more challenging issues, but all of these passed overwhelmingly once they were updated. The process really worked (it was very cool).

The meeting closed with clapping and cheering, folks were really happy about the improved proposals for the specification and it was a very positive experience for me personally.

Sounds like a good result doesn’t it? And Here’s another Microsoft chappy obviously having a good time in Geneva… Not sure about how much work he’s been doing although his masseuse looks nice!

So, one might assume from the above that all is well in the land of ISO and ECMA and the future is bright for the OOXML specification.

But wait, lets see what some other people thought about it:

Andy Updegrove has collated a number of sources of information together and has a quite confusing analysis of what the results will actually mean. It certainly doesn’t sound like Brian’s comment above:

… but all of these passed overwhelmingly once they were updated. The process really worked (it was very cool).

Here’s Andy’s post and here’s an extract:

There are two ways in which you may hear the results of the BRM summarised by those that issue statements and press releases in the days to come. Perhaps inevitably, they are diametrically opposed, as has so often happened in the ODF – OOXML saga to date. Those results are as follows:

98.4% of the OOXML Proposed Dispositions were approved by a three to two majority at the BRM, validating OOXML

The OOXML Proposed Dispositions OOXML were overwhelmingly rejected by the delegations in attendance at the BRM, indicating the inability of OOXML to be adequately addressed within the “Fast Track” process

Interesting. I suggest you read the whole piece to find out why the second of the two results is closer to the mark.

We also have some comments from members of the Malaysian delegation here where their headline states:

“BRM in Geneva is over: big failure for OOXML”

They go on to say:

As you might have guessed, the five day meeting failed to properly address the huge amount of comments and proposed dispositions, and a rushed vote on Friday tried to lump together all unresolved issues in a package where the ECMA dispositions were to be voted on without any discussion. Needless to say, that failed miserably. Only ten national delegations voted, and only 4 P-members were for approval. 4 P-members disapproved, a whopping 15 abstained, and 2 even refused to register a vote in protest.

Although as one would expect they conclude that:

If you count all voting delegates, including those who are not P members, the vote was 6 approvals, 4 disapprovals, 18 abstentions and 4 refusals to vote. Expect this to be announced by Microsoft as a “3 to 2 majority for OOXML approval” in the next few hours. The reality is of course that this is a huge setback for Microsoft. The tricks they have been trying have backfired, and it is now more clear than ever before that OOXML is an immature specification which was totally inappropriate for the fast track procedure.

Hmmm, this doesn’t sound like the report from Brian Jones at the beginning does it?

I’d like to finish with some words from Tim Bray, who was a member of the Canadian delegation. On his blog post he is clearly not enthralled by the experienced:

The process was complete, utter, unadulterated bullshit. I’m not an ISO expert, but whatever their “Fast Track” process was designed for, it sure wasn’t this. You just can’t revise six thousand pages of deeply complex specification-ware in the time that was provided for the process. That’s true whether you’re talking about the months between the vote and when the Responses were available, the weeks between the Responses’ arrival and the BRM, or the hours in the BRM room.

And in scathing language, Tim goes on:

This was horrible, egregious, process abuse and ISO should hang their heads in shame for allowing it to happen. Their reputation, in my eyes, is in tatters. My opinion of ECMA was already very negative; this hasn’t improved it, and if ISO doesn’t figure out away to detach this toxic leech, this kind of abuse is going to happen again and again.

Toxic Leech. I like that! That’s a great expression.

There is some excellent advice from Tim too about what the NBs might like to do with regards to their final decision:

The national bodies that voted on the first round have thirty days to decide if they want to change their vote. I totally don’t believe that ECMA/Microsoft is going to be able to pull together a revised draft of this Frankenstein’s monster in that timeframe. That seems like a pretty serious process issue to me, too.

In practise this means that the heavy politics starts Monday morning. National bodies that are smart will make their decision between 8:30 and 9:00 AM on March 3rd and immediately go on long vacations in Tasmania or Nunavut.

So, who would you tend to believe? A convicted monopolist that has just been fined $1.4bn for ignoring the EU’s judgement of 2004. A business which has been found to be paying NGO’s in India to write to their Government to support OOXML, and providing them with the form letters too!

Or, the rest of the world? I know where my hat rests. Do you? Think about it next time you need to buy some software. Really, Really, Think About It…

The Deprecated “Smoke Screen” of MS Office Open XML (OOXML)

BSI British Standards states:

“… a standard is an agreed, repeatable way of doing something. It is a published document that contains a technical specification or other precise criteria designed to be used consistently as a rule, guideline, or definition. Standards help to make life simpler and to increase the reliability and the effectiveness of many goods and services we use. They are intended to be aspirational – a summary of good and best practice rather than general practice. Standards are created by bringing together the experience and expertise of all interested parties such as the producers, sellers, buyers, users and regulators of a particular material, product, process or service.”

In an effort to win quick converts to its bid to have Microsoft Office Open XML (MOOXML) accepted as an ISO standard, Microsoft is deprecating parts of its widely-criticized MOOXML. But whatever the new Microsoft OOXML file format with deprecated parts will eventually look like (if such a format ever appears in an actual application), these cosmetic changes don’t really make a difference for Microsoft or the world. Neither Microsoft Office 2007 or the version after that will ever likely produce a standards-compliant format. Besides, OpenDocument has been around now for a few years and is becoming widely supported in industry. However, there has been no meaningful movement from MS towards support. Actions speak louder than words.

What is described in the ECMA OOXML specification is not what is currently implemented in MS Office 2007. The actual specification: says ECMA OOXML is a format that Microsoft Office 2007 can *read*. Note, however, that it is not the format that Microsoft Office 2007 is actually *writing* for example: The Scripts, macros, passwords, Sharepoint tagshooks, DRM and other tie-ins used by MS Office 2007 are not part of the ECMA OOXML specification. If you try encrypting a document in Office 2007, it is no longer even a zip file + XML at that point. There is no editor reference application for Office Open XML, so an application can send Office Open files to Microsoft Office, and Microsoft Office can open those files, but any edits are saved in a different format!

Launch Microsoft Office and try to save a file in the format specified by the draft standard at ISO. You can’t. There is no compatibility mode in Microsoft Office that limits input to the feature set specified in the official Microsoft Office Open XML draft ISO standard. Any suggestions of interoperability for anyone wanting to support the Microsoft Office Open XML specification is ridiculous, especially since Microsoft itself won’t allow its customers to write to that format.

Microsoft will NOT change its Office program to become compliant with ECMA . The marketing firms on retainer will simply advertise loud and clear that “Microsoft OOXML is now an ISO standard”, and will blur the differences it sees between MS OOXML, ECMA OOXML and ISO OOXML. This will do the trick for most people, who are not technical experts. But they will eventually get caught again in the confusion. Microsoft is not concerned about what the global community needs, but is acting strictly to protect its monopoly.

Deprecating some controversial issues shows some of the signs of the significant failures of the format. Shuffling chapters around and putting some parts in the annex is not the answer to technical shortcomings. Such aggressive proposals at this time, seem more geared to be for “Talking Points” only rather than the sincere interest in creating a truly open standard.

There are still major problems with the format as now proposed in its deprecated form, from cultural and linguistics adaptability problems, accessibility issues, to the reliance on the MS Windows product, the guidance to what is called the “DEVMODE” structure, increased Patent problems, added harmonisation and interoperability problems, such that third party implementation remains almost impossible. And there are many, many other problems with MOOXML as an ISO standard. And let us not forget the proposed format has never been implemented or tested. Indeed, one wonders if MOOXML can be tested or implemented by any vendor other than Microsoft. MOOXML is still far from achieving acceptance as a true standard.

The fact is that even MS Office 2007 itself has not implemented the initially proposed ECMA format. So it is more than apparent that the new “smoke screen” proposals will never be implemented or even if they can be, not even by Microsoft, let alone third party vendors. It also dooms all the .docx files out there already. Is MS ready to carry out a product recall or ready to develop another converter for this problem? Not likely.

Moving stuff into deprecated status does not ease the burden of implementing DIS 29500. The TRUTH IS that every application will need to support the deprecated features in order to read files with the deprecated features.

The legacy binary formats remain closed. If a file is one which was converted from an older format of Microsoft Office by DIS29500 and allowed to wrap the old file in xml, it remains unreadable for everyone else. OOXML is still a closed spec tied into to many proprietary formats.

ECMA 376 is a bomb disguised as a standard. It redefines functions and components just to retain ties to the undocumented legacy formats. Therefore a number of things that should be fixed by now, thanks to better engineering, and existing ISO standards, are left not only unfixed, but even perpetuated by ECMA376. Why? There is a difference between preserving old files and moving them to a new format with all the same internal bugs. In essence, Microsoft is shoving their own mistakes right down the throat of ECMA/ISO. Microsoft has the audacity to appear to be saying that the standard meets a different need, when all it seems to mean is : “we don’t wanna fix our bugs, because that would force us to use standards, and that is unacceptable to us.” Unfortunately, the new proposals illuminate this unchanged and obstreperous position.

Further more the proposed deprecated changes increases the already dramatic overlap with the established ISO standard for Office Documents. If creates new patent problems in such that now MS reserve the right to sue you if you implement any of the deprecated stuff moved to the annex of the proposed standard. It makes harmonization and interoperability worse than ever because without the code for interpreting the deprecated items, any file with deprecated data will be impossible to read properly. It is obvious, but despite the obviousness, the problem persists.

To the extent that Office 2007 will have to be changed, to the extensive coding work which would need to be done, don’t you think it is just wiser to reject OOXML as a ISO standard because it is not one, and for Microsoft to collaborate on the development of ODF and create one universal file format for everyone.

The Culture of Self Interest is not Open

Bill Gates Plainiff’s Exhibit

ABOVE: Comes vs. Microsoft Plaintiff’s Exhibit. See original version here in PDF.

So let us be clear, an ISO standard should benefit everyone and should be developed by consensus for fair competition and through open participation for all to embrace, enhance and share. DIS29500 as now proposed still only serves the commercial interest of one vendor and will always only serve the interest of one vendor – Microsoft. This is the way the OOXML format was designed. It was designed to ferment their monopoly into the sun. Microsoft will make promises to the National Boards that it will fix the OOXML format “later”, but as this standardisation process has shown so far, Microsoft doesn’t keep promises.

Unless wasting time is part of the current marketing tactics used by Microsoft, the most advantageous action would be for that company to accept the standing invitation to collaborate on the development of the established standard, the OpenDocument Format, and to create one universal file format for everyone – the fundamental purpose of standardisation.

This article was originally written by Russell Ossendryver.

Big Blue on OOXML

Some will probably say “it’s about time too”…

IBM has made public an article written by Peter Seebach called “OOXML: What’s the big deal?”.

In it Peter explains in clear and unambiguous language why Microsoft’s OOXML document format (also know as DIS29500 or ECMA-376) is not fit to be an international standard.

Stating what has already been said many times before might be construed as boring or repetitive, but in this case Peter gives a refreshingly concise review and summary of the main issues. Many of which have been lost in the verbosity and plethora of opinion and conjecture that abounds on the web regarding OOXML.

Here are couple of salient comments from the piece:

There have been a number of technical complaints made about OOXML. Every one of them comes down to the same base complaint: Rather than specifying a reasonable common interchange format, OOXML specifies the whole feature set of Microsoft Office, down to bug compatibility. This creates a burden on other implementers which is simply unreasonable (and in fact impossible) to meet, while conveniently being precisely what Microsoft is already shipping. That raises a lot of concerns.

He goes on to examine three categories of “showstopper problems” and gives examples in each. The final category, “Unique Features”, is quite damming in it’s final analysis…

Probably the most famous example is one of the optional settings provided in OOXML. The setting is called “useWord97LineBreakRules”, and it specifies to use the line-break rules that were used in Word ’97 for East Asian documents. Much like the previous examples, this is of course impossible for anyone else to do, as no specification of these rules is provided. In fact, the OOXML standard even warns implementers not to implement this:

The OOXML standard’s guidance for useWord97LineBreakRules

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

This guidance is excellent. Given that there is no specification available of this feature, and it is deprecated, it makes all kinds of sense for people not to implement it. But wait; if it shouldn’t be implemented, why is it in the spec? Compatibility with existing documents is not a reason to add a feature to a standard aimed at interchanging data; users are worried about whether their text can be opened at all in another program, not whether every line break is in the exact same location!

This feature is in the spec because OOXML is not a document interchange format; it’s a careful, bit-for-bit, replication of Microsoft’s historical binary formats, wrapped up in angle brackets.

That’s a cracking analysis. OOXML is NOT a document interchange format. It’s MS Office binary wrapped in XML

Peter’s conclusion says it all.

OOXML is a credible effort to solve a real problem: The problem of how to replace completely opaque binary files encoding ten years of accreted behaviour with partially-legible XML files encoding the same behaviour, down to the last bit. That problem, unfortunately, is not the problem of providing a usable, implementable, exchange format for office documents.

OOXML should not, and must not become an ISO standard. It is, as we have been saying all along, a proprietary vendor’s implementation of their proprietary document format. There will be only one beneficiary if this becomes a global standard, and it isn’t you or me…

« Previous PageNext Page »