Is there a simple way to take a pandas/df table:
field_1 field_2 field_3 field_4 cat 15,263 2.52 00:03:00 dog 1,652 3.71 00:03:47 test 312 3.27 00:03:41 book 300 3.46 00:02:40
And convert it to XML along the lines of:
<item>
<field name="field_1">cat</field>
<field name="field_2">15263</field>
<field name="filed_3">2.52</field>
...
<item>
<field name="field_1">dog</field>
etc.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can create a function that creates the item node from a row in your DataFrame:
def func(row):
xml = ['<item>']
for field in row.index:
xml.append(' <field name="{0}">{1}</field>'.format(field, row[field]))
xml.append('</item>')
return 'n'.join(xml)
And then apply the function along the axis=1.
>>> print 'n'.join(df.apply(func, axis=1)) <item> <field name="field_1">cat</field> <field name="field_2">15,263</field> <field name="field_3">2.52</field> <field name="field_4">00:03:00</field> </item> <item> <field name="field_1">dog</field> <field name="field_2">1,652</field> <field name="field_3">3.71</field> <field name="field_4">00:03:47</field> </item> ...
Method 2
To expand on Viktor’s excellent answer (and tweaking it slightly to work with duplicate columns), you could set this up as a to_xml DataFrame method:
def to_xml(df, filename=None, mode='w'):
def row_to_xml(row):
xml = ['<item>']
for i, col_name in enumerate(row.index):
xml.append(' <field name="{0}">{1}</field>'.format(col_name, row.iloc[i]))
xml.append('</item>')
return 'n'.join(xml)
res = 'n'.join(df.apply(row_to_xml, axis=1))
if filename is None:
return res
with open(filename, mode) as f:
f.write(res)
pd.DataFrame.to_xml = to_xml
Then you can print the xml:
In [21]: print df.to_xml() <item> <field name="field_1">cat</field> <field name="field_2">15,263</field> <field name="field_3">2.52</field> <field name="field_4">00:03:00</field> </item> <item> ...
or save it to a file:
In [22]: df.to_xml('foo.xml')
Obviously this example should be tweaked to fit your xml standard.
Method 3
You can use the xml.etree.ElementTree package to generate a read-friendly format in a very few lines of code.
root = etree.Element('data');
for i,row in dframe.iterrows():
item = etree.SubElement(root, 'item', attrib=row.to_dict());
etree.dump(root);
This will create a XML Tree (under root), where each row will will be of type item, and have attributes for all columns. You can create a more nested tree with columns as well by creating a subelement for each field.
Then you can also read the xml file back in Python using the ElementTree package:
xml.etree.ElementTree.parse('xml_file.xml');
Method 4
As of v1.3, you can simply use:
df.to_xml()
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0