I have some data that looks like this:
c stuff c more header c begin data 1 1:.5 1 2:6.5 1 3:5.3
I want to import it into a 3 column data frame, with columns e.g.
a , b, c 1, 1, 0.5 etc
I have been trying to read in the data as 2 columns split on ‘:’, and then to split the first column on ‘ ‘. However I’m finding it irksome.
Is there a better way to sort it out on import directly?
currently:
data1 = pd.read_csv(file_loc, skiprows = 3, delimiter = ':', names = ['AB', 'C'])
data2 = pd.DataFrame(data1.AB.str.split(' ',1).tolist(), names = ['A','B'])
However this is further complicated by the fact my data has a leading space…
I feel like this should be a simple task, but currently I’m thinking of reading it line by line and using some find replace to sanitise the data before importing.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
One way might be to use the regex separators permitted by the python engine. For example:
>>> !cat castle.dat
c stuff
c more header
c begin data
1 1:.5
1 2:6.5
1 3:5.3
>>> df = pd.read_csv('castle.dat', skiprows=3, names=['a', 'b', 'c'],
sep=' |:', engine='python')
>>> df
a b c
0 1 1 0.5
1 1 2 6.5
2 1 3 5.3
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0