Odfpy is a python library to read and write OpenDocument documents (such as the .odt or.ods created with OpenOffice.org). However, the documentation and examples shipped with Odfpy are more oriented to writing new documents rather than reading existing ones.
Failing to find any simple spreadsheet reading code snippet on the internet, I wrote a simple ODS reader in python that reads an entire .ods file in a dictionary of sheets, where each sheets is stored as an array of arrays (rows, columns). It still requires Odfpy to run.
It has been tested with odfpy 0.9.3 and python 2.7, using ods files created with OpenOffice.org. Here’s the script: Odf to array Python script.
Usage example (thanks to gtr):
from ODSReader import *
doc = ODSReader("films.ods")
table = doc.getSheet("Sheet1")
firstRow = table[0]
firstCellOfFirstRow = firstRow[0]
EDIT: fixed to handle “number-columns-repeated” properly, added usage example, included fixes by gtr

Thanks — I have many spreadsheets from past research and was looking for how to get one cell value out of each of them, this might do it.
But I am a bit new to Python — do you have an example how to use this class to get a cell value from a particular sheet? — thanks
Quick example:
from ODSReader import *
doc = ODSReader("films.ods")
table = doc.getSheet("Sheet1")
firstRow = table[0]
firstCellOfFirstRow = firstRow[0]
If you encounter bugs, you might want to try applying the fixes I’ve mentioned below.
Thanks for this! Simplifies what I’m doing quite a bit.
When I tried it out, I came across a few bugs:
1. Your code assumes all childNodes of doc.spreadsheet are valid Tables. One of mine isn’t, it’s some other type of element, and your code throws an error expecting it to have a name attribute when it doesn’t.
Fix – change line 25 from:
for sheet in self.doc.spreadsheet.childNodes:
to:
for sheet in self.doc.spreadsheet.getElementsByType(Table):
2. Your code assumes each P has a single childNode which is a textNode. At least one of mine has multiple children, which are a mixture of childnodes and the occasional or suchlike. Your code was throwing errors expecting the elements to have a data attribute.
Fix – change lines 52 and 53 from:
text = str(p.firstChild.data)
textContent = textContent + text
to:
for n in p.childNodes:
if (n.nodeType == 3):
textContent = textContent + str(n.data)
3. I’m not sure if this is really a bug, but on my system, because ascii is the default encoding, applying str to the unicode data in the text nodes sometimes caused errors with “not in range” characters. So I changed my updated line 54 that I’d inserted as a result of bug 2 from:
textContent = textContent + str(n.data)
to:
textContent = textContent + n.data
…so that the resulting array contains unicode data. I’m aware that I’m only postponing my encoding woes to a later stage of the process, but still. So I guess this one could be a matter of personal taste.
Hope that helps someone….
Description of bug 2 should read:
“Your code assumes each P has a single childNode which is a textNode. At least one of mine has multiple children, which are a mixture of childnodes and the occasional text:s or suchlike. Your code was throwing errors expecting the text:s elements to have a data attribute.”
…the blog software stripped out the “text:s” bits because I put angle brackets on them and it thought it was a HTML tag.
Thanks gtr, your comment is really helpful. I will fix the code as soon as I have some free time to test it!
Wow, that was an impressively fast reply; nice one :)
It’s not fixed yet, I just moved the code to an attached file, I’ll fix in the next days
Yet another typo:
“…mixture of childnodes…” -> “…mixture of textnodes…”
I updated the code to include your fixes. Regarding the unicode problem, I replaced the str() function with the unicode() function, hope it works for you.
Thanks for your contribution!
Take care
Thanks a lot for this module, it really made working with my ODS spreadsheet a lot easier!
However, I had a few problems with it…
The big one was that it totally ignores empty cells/rows. So if you have empty cells/rows in your ODS all indices are going to be totally wrong.
The other problem was that it did not recognize some of the text fields in my sheets. Those were represented not as Text nodes, but as text:span elements with Text children.
I changed the code to include empty cells/rows and to accept the text:span elements. There is a diff and the full version online.
Thanks again!
Dirk
P.S.: Why is it not called ODSReader.py? Given that that’s the exported class it seems to be the logical choice…
Why not put this on github? I’d love to branch it and add a few tweaks/features.
Good suggestion, I will when I have some time (that might happen only in a while :P)
the odfpy server is down :(
cannot download
Thank you!
It is very very useful, although quite slow (compared to reading a text file with the same data).
I found a “problem” when using this script: how can one control the accuracy of numbers imported from the spreadsheet?
Indeed, very helpful, thanks to you, macro and also Dirk!
This is an amazing tool for reading the rows into python environment thus we can work easily on it. Thanks a lot!
Empty cells in a row are skipped: this makes reading the contents of a specific cell in a row difficult.
A solution to this problem is to remove “if textContent” and to use “not textContent.startswith(‘#’)” instead of “textContent[0] == ‘#’”. Empty cells are then simply reported as containing the empty string.
I am not sure if this break anything, though…
Hi Eric,
thanks for the contribution. It’s a design decision based on what I needed the script for (ignore empty cells vs. empty cells are empty string). Your solution should work, but I haven’t tried myself.