Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?

I have a Numpy array consisting of a list of lists, representing a two-dimensional array with row labels and column names as shown below:

data = array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]])

I’d like the resulting DataFrame to have Row1 and Row2 as index values, and Col1, Col2 as header values

I can specify the index as follows:

df = pd.DataFrame(data,index=data[:,0]),

however I am unsure how to best assign column headers.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You need to specify data, index and columns to DataFrame constructor, as in:

>>> pd.DataFrame(data=data[1:,1:],    # values
...              index=data[1:,0],    # 1st column as index
...              columns=data[0,1:])  # 1st row as the column names

edit: as in the @joris comment, you may need to change above to np.int_(data[1:,1:]) to have correct data type.

Method 2

Here is an easy to understand solution

import numpy as np
import pandas as pd

# Creating a 2 dimensional numpy array
>>> data = np.array([[5.8, 2.8], [6.0, 2.2]])
>>> print(data)
>>> data
array([[5.8, 2.8],
       [6. , 2.2]])

# Creating pandas dataframe from numpy array
>>> dataset = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1]})
>>> print(dataset)
   Column1  Column2
0      5.8      2.8
1      6.0      2.2

Method 3

I agree with Joris; it seems like you should be doing this differently, like with numpy record arrays. Modifying “option 2” from this great answer, you could do it like this:

import pandas
import numpy

dtype = [('Col1','int32'), ('Col2','float32'), ('Col3','float32')]
values = numpy.zeros(20, dtype=dtype)
index = ['Row'+str(i) for i in range(1, len(values)+1)]

df = pandas.DataFrame(values, index=index)

Method 4

This can be done simply by using from_records of pandas DataFrame

import numpy as np
import pandas as pd
# Creating a numpy array
x = np.arange(1,10,1).reshape(-1,1)
dataframe = pd.DataFrame.from_records(x)

Method 5

    >>import pandas as pd
    >>import numpy as np
    >>data.shape
    (480,193)
    >>type(data)
    numpy.ndarray
    >>df=pd.DataFrame(data=data[0:,0:],
    ...        index=[i for i in range(data.shape[0])],
    ...        columns=['f'+str(i) for i in range(data.shape[1])])
    >>df.head()
    [![array to dataframe][1]][1]

enter image description here

Method 6

Adding to @behzad.nouri ‘s answer – we can create a helper routine to handle this common scenario:

def csvDf(dat,**kwargs): 
  from numpy import array
  data = array(dat)
  if data is None or len(data)==0 or len(data[0])==0:
    return None
  else:
    return pd.DataFrame(data[1:,1:],index=data[1:,0],columns=data[0,1:],**kwargs)

Let’s try it out:

data = [['','a','b','c'],['row1','row1cola','row1colb','row1colc'],
     ['row2','row2cola','row2colb','row2colc'],['row3','row3cola','row3colb','row3colc']]
csvDf(data)

In [61]: csvDf(data)
Out[61]:
             a         b         c
row1  row1cola  row1colb  row1colc
row2  row2cola  row2colb  row2colc
row3  row3cola  row3colb  row3colc

Method 7

Here simple example to create pandas dataframe by using numpy array.

import numpy as np
import pandas as pd

# create an array 
var1  = np.arange(start=1, stop=21, step=1).reshape(-1)
var2 = np.random.rand(20,1).reshape(-1)
print(var1.shape)
print(var2.shape)

dataset = pd.DataFrame()
dataset['col1'] = var1
dataset['col2'] = var2
dataset.head()

Method 8

I think this is a simple and intuitive method:

data = np.array([[0, 0], [0, 1] , [1, 0] , [1, 1]])
reward = np.array([1,0,1,0])

dataset = pd.DataFrame()
dataset['StateAttributes'] = data.tolist()
dataset['reward'] = reward.tolist()

dataset

returns:

Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?

But there are performance implications detailed here:

How to set the value of a pandas column as list

Method 9

It’s not so short, but maybe can help you.

Creating Array

import numpy as np
import pandas as pd

data = np.array([['col1', 'col2'], [4.8, 2.8], [7.0, 1.2]])

>>> data
array([['col1', 'col2'],
       ['4.8', '2.8'],
       ['7.0', '1.2']], dtype='<U4')

Creating data frame

df = pd.DataFrame(i for i in data).transpose()
df.drop(0, axis=1, inplace=True)
df.columns = data[0]
df

>>> df
  col1 col2
0  4.8  7.0
1  2.8  1.2


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x