I have a Numpy array consisting of a list of lists, representing a two-dimensional array with row labels and column names as shown below:
data = array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]])
I’d like the resulting DataFrame to have Row1 and Row2 as index values, and Col1, Col2 as header values
I can specify the index as follows:
df = pd.DataFrame(data,index=data[:,0]),
however I am unsure how to best assign column headers.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You need to specify data, index and columns to DataFrame constructor, as in:
>>> pd.DataFrame(data=data[1:,1:], # values ... index=data[1:,0], # 1st column as index ... columns=data[0,1:]) # 1st row as the column names
edit: as in the @joris comment, you may need to change above to np.int_(data[1:,1:]) to have correct data type.
Method 2
Here is an easy to understand solution
import numpy as np
import pandas as pd
# Creating a 2 dimensional numpy array
>>> data = np.array([[5.8, 2.8], [6.0, 2.2]])
>>> print(data)
>>> data
array([[5.8, 2.8],
[6. , 2.2]])
# Creating pandas dataframe from numpy array
>>> dataset = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1]})
>>> print(dataset)
Column1 Column2
0 5.8 2.8
1 6.0 2.2
Method 3
I agree with Joris; it seems like you should be doing this differently, like with numpy record arrays. Modifying “option 2” from this great answer, you could do it like this:
import pandas
import numpy
dtype = [('Col1','int32'), ('Col2','float32'), ('Col3','float32')]
values = numpy.zeros(20, dtype=dtype)
index = ['Row'+str(i) for i in range(1, len(values)+1)]
df = pandas.DataFrame(values, index=index)
Method 4
This can be done simply by using from_records of pandas DataFrame
import numpy as np import pandas as pd # Creating a numpy array x = np.arange(1,10,1).reshape(-1,1) dataframe = pd.DataFrame.from_records(x)
Method 5
>>import pandas as pd
>>import numpy as np
>>data.shape
(480,193)
>>type(data)
numpy.ndarray
>>df=pd.DataFrame(data=data[0:,0:],
... index=[i for i in range(data.shape[0])],
... columns=['f'+str(i) for i in range(data.shape[1])])
>>df.head()
[![array to dataframe][1]][1]
Method 6
Adding to @behzad.nouri ‘s answer – we can create a helper routine to handle this common scenario:
def csvDf(dat,**kwargs):
from numpy import array
data = array(dat)
if data is None or len(data)==0 or len(data[0])==0:
return None
else:
return pd.DataFrame(data[1:,1:],index=data[1:,0],columns=data[0,1:],**kwargs)
Let’s try it out:
data = [['','a','b','c'],['row1','row1cola','row1colb','row1colc'],
['row2','row2cola','row2colb','row2colc'],['row3','row3cola','row3colb','row3colc']]
csvDf(data)
In [61]: csvDf(data)
Out[61]:
a b c
row1 row1cola row1colb row1colc
row2 row2cola row2colb row2colc
row3 row3cola row3colb row3colc
Method 7
Here simple example to create pandas dataframe by using numpy array.
import numpy as np import pandas as pd # create an array var1 = np.arange(start=1, stop=21, step=1).reshape(-1) var2 = np.random.rand(20,1).reshape(-1) print(var1.shape) print(var2.shape) dataset = pd.DataFrame() dataset['col1'] = var1 dataset['col2'] = var2 dataset.head()
Method 8
I think this is a simple and intuitive method:
data = np.array([[0, 0], [0, 1] , [1, 0] , [1, 1]]) reward = np.array([1,0,1,0]) dataset = pd.DataFrame() dataset['StateAttributes'] = data.tolist() dataset['reward'] = reward.tolist() dataset
returns:
But there are performance implications detailed here:
How to set the value of a pandas column as list
Method 9
It’s not so short, but maybe can help you.
Creating Array
import numpy as np
import pandas as pd
data = np.array([['col1', 'col2'], [4.8, 2.8], [7.0, 1.2]])
>>> data
array([['col1', 'col2'],
['4.8', '2.8'],
['7.0', '1.2']], dtype='<U4')
Creating data frame
df = pd.DataFrame(i for i in data).transpose() df.drop(0, axis=1, inplace=True) df.columns = data[0] df >>> df col1 col2 0 4.8 7.0 1 2.8 1.2
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

