In Pandas, does .iloc method give a copy or view?

I find the result is a little bit random. Sometimes it’s a copy sometimes it’s a view. For example:

df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}],index=['student1','student2'])

df
              age   name
   student1   21  Marry
   student2   24   John

Now, Let me try to modify it a little bit.

df2= df.loc['student1']
df2 [0] = 23
df
              age   name
   student1   21  Marry
   student2   24   John

As you can see, nothing changed. df2 is a copy. However, if I add another student into the dataframe…

df.loc['student3'] = ['old','Tom']
df
               age   name
    student1   21  Marry
    student2   24   John
    student3  old    Tom

Try to change the age again..

df3=df.loc['student1']
df3[0]=33
df
               age   name
    student1   33  Marry
    student2   24   John
    student3  old    Tom

Now df3 suddenly became a view. What is going on? I guess the value ‘old’ is the key?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You are starting with a DataFrame that has two columns with two different dtypes:

df.dtypes
Out: 
age      int64
name    object
dtype: object

Since different dtypes are stored in different numpy arrays under the hood, you have two different blocks for them:

df.blocks

Out: 
{'int64':           age
 student1   21
 student2   24, 'object':            name
 student1  Marry
 student2   John}

If you attempt to slice the first row of this DataFrame, it has to get one value from each different block which makes it necessary to create a copy.

df2.is_copy
Out[40]: <weakref at 0x7fc4487a9228; to 'DataFrame' at 0x7fc4488f9dd8>

In the second attempt, you are changing the dtypes. Since ‘old’ cannot be stored in an integer array, it casts the Series as an object Series.

df.loc['student3'] = ['old','Tom']

df.dtypes
Out: 
age     object
name    object
dtype: object

Now all data for this DataFrame is stored in a single block (and in a single numpy array):

df.blocks

Out: 
{'object':           age   name
 student1   21  Marry
 student2   24   John
 student3  old    Tom}

At this step, slicing the first row can be done on the numpy array without creating a copy, so it returns a view.

df3._is_view
Out: True

Method 2

In general, you can get a view if the data-frame has a single dtype, which is not the case with your original data-frame:

In [4]: df
Out[4]:
          age   name
student1   21  Marry
student2   24   John

In [5]: df.dtypes
Out[5]:
age      int64
name    object
dtype: object

However, when you do:

In [6]: df.loc['student3'] = ['old','Tom']
   ...:

The first column get’s coerced to object, since columns cannot have mixed dtypes:

In [7]: df.dtypes
Out[7]:
age     object
name    object
dtype: object

In this case, the underlying .values will always return an array with the same underlying buffer, and changes to that array will be reflected in the data-frame:

In [11]: vals = df.values

In [12]: vals
Out[12]:
array([[21, 'Marry'],
       [24, 'John'],
       ['old', 'Tom']], dtype=object)

In [13]: vals[0,0] = 'foo'

In [14]: vals
Out[14]:
array([['foo', 'Marry'],
       [24, 'John'],
       ['old', 'Tom']], dtype=object)

In [15]: df
Out[15]:
          age   name
student1  foo  Marry
student2   24   John
student3  old    Tom

On the other hand, with mixed types like with your original data-frame:

In [26]: df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}]
    ...: ,index=['student1','student2'])
    ...:

In [27]: vals = df.values

In [28]: vals
Out[28]:
array([[21, 'Marry'],
       [24, 'John']], dtype=object)

In [29]: vals[0,0] = 'foo'

In [30]: vals
Out[30]:
array([['foo', 'Marry'],
       [24, 'John']], dtype=object)

In [31]: df
Out[31]:
          age   name
student1   21  Marry
student2   24   John

Note, however, that a view will only be returned if it is possible to be a view, i.e. if it is a proper slice, otherwise, a copy will be made regardless of the dtypes:

In [39]: df.loc['student3'] = ['old','Tom']


In [40]: df2
Out[40]:
          name
student3   Tom
student2  John

In [41]: df2.loc[:] = 'foo'

In [42]: df2
Out[42]:
         name
student3  foo
student2  foo

In [43]: df
Out[43]:
          age   name
student1   21  Marry
student2   24   John
student3  old    Tom


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x