Read an ODS file with Python and Odfpy

Odfpy is a python library to read and write OpenDocument documents (such as the .odt or.ods created with OpenOffice.org). However, the documentation and examples shipped with Odfpy are more oriented to writing new documents rather than reading existing ones.

Failing to find any simple spreadsheet reading code snippet on the internet, I wrote a simple ODS reader in python that reads an entire .ods file in a dictionary of sheets, where each sheets is stored as an array of arrays (rows, columns). It still requires Odfpy to run.

It has been tested with odfpy 0.9.3 and python 2.7, using ods files created with OpenOffice.org. Here’s the script: Odf to array Python script.

Usage example (thanks to gtr):

from ODSReader import *
doc = ODSReader("films.ods")
table = doc.getSheet("Sheet1")
firstRow = table[0]
firstCellOfFirstRow = firstRow[0]

EDIT: fixed to handle “number-columns-repeated” properly, added usage example, included fixes by gtr

This entry was posted in code snippets and tagged . Bookmark the permalink.

19 Responses to Read an ODS file with Python and Odfpy

  1. d l says:

    Thanks — I have many spreadsheets from past research and was looking for how to get one cell value out of each of them, this might do it.
    But I am a bit new to Python — do you have an example how to use this class to get a cell value from a particular sheet? — thanks

    • gtr says:

      Quick example:

      from ODSReader import *
      doc = ODSReader("films.ods")
      table = doc.getSheet("Sheet1")
      firstRow = table[0]
      firstCellOfFirstRow = firstRow[0]

      If you encounter bugs, you might want to try applying the fixes I’ve mentioned below.

  2. gtr says:

    Thanks for this! Simplifies what I’m doing quite a bit.

    When I tried it out, I came across a few bugs:

    1. Your code assumes all childNodes of doc.spreadsheet are valid Tables. One of mine isn’t, it’s some other type of element, and your code throws an error expecting it to have a name attribute when it doesn’t.

    Fix – change line 25 from:

    for sheet in self.doc.spreadsheet.childNodes:

    to:

    for sheet in self.doc.spreadsheet.getElementsByType(Table):

    2. Your code assumes each P has a single childNode which is a textNode. At least one of mine has multiple children, which are a mixture of childnodes and the occasional or suchlike. Your code was throwing errors expecting the elements to have a data attribute.
    Fix – change lines 52 and 53 from:

    text = str(p.firstChild.data)
    textContent = textContent + text

    to:

    for n in p.childNodes:
    if (n.nodeType == 3):
    textContent = textContent + str(n.data)

    3. I’m not sure if this is really a bug, but on my system, because ascii is the default encoding, applying str to the unicode data in the text nodes sometimes caused errors with “not in range” characters. So I changed my updated line 54 that I’d inserted as a result of bug 2 from:

    textContent = textContent + str(n.data)

    to:

    textContent = textContent + n.data

    …so that the resulting array contains unicode data. I’m aware that I’m only postponing my encoding woes to a later stage of the process, but still. So I guess this one could be a matter of personal taste.

    Hope that helps someone….

    • gtr says:

      Description of bug 2 should read:

      “Your code assumes each P has a single childNode which is a textNode. At least one of mine has multiple children, which are a mixture of childnodes and the occasional text:s or suchlike. Your code was throwing errors expecting the text:s elements to have a data attribute.”

      …the blog software stripped out the “text:s” bits because I put angle brackets on them and it thought it was a HTML tag.

    • marco83 says:

      I updated the code to include your fixes. Regarding the unicode problem, I replaced the str() function with the unicode() function, hope it works for you.
      Thanks for your contribution!
      Take care

  3. DirkR says:

    Thanks a lot for this module, it really made working with my ODS spreadsheet a lot easier!

    However, I had a few problems with it…

    The big one was that it totally ignores empty cells/rows. So if you have empty cells/rows in your ODS all indices are going to be totally wrong.

    The other problem was that it did not recognize some of the text fields in my sheets. Those were represented not as Text nodes, but as text:span elements with Text children.

    I changed the code to include empty cells/rows and to accept the text:span elements. There is a diff and the full version online.

    Thanks again!

    Dirk

    P.S.: Why is it not called ODSReader.py? Given that that’s the exported class it seems to be the logical choice…

  4. Adam says:

    Why not put this on github? I’d love to branch it and add a few tweaks/features.

  5. AnojiRox says:

    the odfpy server is down :(
    cannot download

  6. a says:

    Thank you!
    It is very very useful, although quite slow (compared to reading a text file with the same data).

  7. alfred says:

    I found a “problem” when using this script: how can one control the accuracy of numbers imported from the spreadsheet?

  8. cgebbe says:

    Indeed, very helpful, thanks to you, macro and also Dirk!

  9. kommradhomer says:

    This is an amazing tool for reading the rows into python environment thus we can work easily on it. Thanks a lot!

  10. Empty cells in a row are skipped: this makes reading the contents of a specific cell in a row difficult.

    A solution to this problem is to remove “if textContent” and to use “not textContent.startswith(‘#’)” instead of “textContent[0] == ‘#’”. Empty cells are then simply reported as containing the empty string.

    I am not sure if this break anything, though…

    • marco83 says:

      Hi Eric,
      thanks for the contribution. It’s a design decision based on what I needed the script for (ignore empty cells vs. empty cells are empty string). Your solution should work, but I haven’t tried myself.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>