I’m trying to read in a CSV file into a pandas dataframe and select a column, but keep getting a key error.
The file reads in successfully and I can view the dataframe in an iPython notebook, but when I want to select a column any other than the first one, it throws a key error.
I am using this code:
import pandas as pd
transactions = pd.read_csv('transactions.csv',low_memory=False, delimiter=',', header=0, encoding='ascii')
transactions['quarter']
This is the file I’m working on:
https://www.dropbox.com/s/81iwm4f2hsohsq3/transactions.csv?dl=0
Thank you!
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
use sep='s*,s*' so that you will take care of spaces in column-names:
transactions = pd.read_csv('transactions.csv', sep=r's*,s*',
header=0, encoding='ascii', engine='python')
alternatively you can make sure that you don’t have unquoted spaces in your CSV file and use your command (unchanged)
prove:
print(transactions.columns.tolist())
Output:
['product_id', 'customer_id', 'store_id', 'promotion_id', 'month_of_year', 'quarter', 'the_year', 'store_sales', 'store_cost', 'unit_sales', 'fact_count']
Method 2
if you need to select multiple columns from dataframe use 2 pairs of square brackets
eg.
df[["product_id","customer_id","store_id"]]
Method 3
I met the same problem that key errors occur when filtering the columns after reading from CSV.
Reason
The main reason of these problems is the extra initial white spaces in your CSV files. (found in your uploaded CSV file, e.g. , customer_id, store_id, promotion_id, month_of_year, )
Proof
To prove this, you could try print(list(df.columns)) and the names of columns must be ['product_id', ' customer_id', ' store_id', ' promotion_id', ' month_of_year', ...].
Solution
The direct way to solve this is to add the parameter in pd.read_csv(), for example:
pd.read_csv('transactions.csv',
sep = ',',
skipinitialspace = True)
Reference: https://stackoverflow.com/a/32704818/16268870
Method 4
The key error generally comes if the key doesn’t match any of the dataframe column name ‘exactly’:
You could also try:
import csv
import pandas as pd
import re
with open (filename, "r") as file:
df = pd.read_csv(file, delimiter = ",")
df.columns = ((df.columns.str).replace("^ ","")).str.replace(" $","")
print(df.columns)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0