I find the result is a little bit random. Sometimes it’s a copy sometimes it’s a view. For example:
df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}],index=['student1','student2'])
df
age name
student1 21 Marry
student2 24 John
Now, Let me try to modify it a little bit.
df2= df.loc['student1']
df2 [0] = 23
df
age name
student1 21 Marry
student2 24 John
As you can see, nothing changed. df2 is a copy. However, if I add another student into the dataframe…
df.loc['student3'] = ['old','Tom']
df
age name
student1 21 Marry
student2 24 John
student3 old Tom
Try to change the age again..
df3=df.loc['student1']
df3[0]=33
df
age name
student1 33 Marry
student2 24 John
student3 old Tom
Now df3 suddenly became a view. What is going on? I guess the value ‘old’ is the key?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You are starting with a DataFrame that has two columns with two different dtypes:
df.dtypes Out: age int64 name object dtype: object
Since different dtypes are stored in different numpy arrays under the hood, you have two different blocks for them:
df.blocks
Out:
{'int64': age
student1 21
student2 24, 'object': name
student1 Marry
student2 John}
If you attempt to slice the first row of this DataFrame, it has to get one value from each different block which makes it necessary to create a copy.
df2.is_copy Out[40]: <weakref at 0x7fc4487a9228; to 'DataFrame' at 0x7fc4488f9dd8>
In the second attempt, you are changing the dtypes. Since ‘old’ cannot be stored in an integer array, it casts the Series as an object Series.
df.loc['student3'] = ['old','Tom'] df.dtypes Out: age object name object dtype: object
Now all data for this DataFrame is stored in a single block (and in a single numpy array):
df.blocks
Out:
{'object': age name
student1 21 Marry
student2 24 John
student3 old Tom}
At this step, slicing the first row can be done on the numpy array without creating a copy, so it returns a view.
df3._is_view Out: True
Method 2
In general, you can get a view if the data-frame has a single dtype, which is not the case with your original data-frame:
In [4]: df
Out[4]:
age name
student1 21 Marry
student2 24 John
In [5]: df.dtypes
Out[5]:
age int64
name object
dtype: object
However, when you do:
In [6]: df.loc['student3'] = ['old','Tom'] ...:
The first column get’s coerced to object, since columns cannot have mixed dtypes:
In [7]: df.dtypes Out[7]: age object name object dtype: object
In this case, the underlying .values will always return an array with the same underlying buffer, and changes to that array will be reflected in the data-frame:
In [11]: vals = df.values
In [12]: vals
Out[12]:
array([[21, 'Marry'],
[24, 'John'],
['old', 'Tom']], dtype=object)
In [13]: vals[0,0] = 'foo'
In [14]: vals
Out[14]:
array([['foo', 'Marry'],
[24, 'John'],
['old', 'Tom']], dtype=object)
In [15]: df
Out[15]:
age name
student1 foo Marry
student2 24 John
student3 old Tom
On the other hand, with mixed types like with your original data-frame:
In [26]: df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}]
...: ,index=['student1','student2'])
...:
In [27]: vals = df.values
In [28]: vals
Out[28]:
array([[21, 'Marry'],
[24, 'John']], dtype=object)
In [29]: vals[0,0] = 'foo'
In [30]: vals
Out[30]:
array([['foo', 'Marry'],
[24, 'John']], dtype=object)
In [31]: df
Out[31]:
age name
student1 21 Marry
student2 24 John
Note, however, that a view will only be returned if it is possible to be a view, i.e. if it is a proper slice, otherwise, a copy will be made regardless of the dtypes:
In [39]: df.loc['student3'] = ['old','Tom']
In [40]: df2
Out[40]:
name
student3 Tom
student2 John
In [41]: df2.loc[:] = 'foo'
In [42]: df2
Out[42]:
name
student3 foo
student2 foo
In [43]: df
Out[43]:
age name
student1 21 Marry
student2 24 John
student3 old Tom
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0