I have to read several files some in Excel format and some in CSV format. Some of the files have hundreds of columns.
Is there a way to select several ranges of columns without specifying all the column names or positions? For example something like selecting columns 1 -10, 15, 17 and 50-100:
df = df.ix[1:10, 15, 17, 50:100]
I need to know how to do this both when creating dataframe from Excel files and CSV files and after the data framers created.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
use np.r_
np.r_[1:10, 15, 17, 50:100]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 15, 17, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
so you can do
df.iloc[:, np.r_[1:10, 15, 17, 50:100]]
Method 2
use inner join
like
result = pd.concat([df1, df4], axis=1, join=”inner”)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0