Say I have a dataframe
import pandas as pd import numpy as np foo = pd.DataFrame(np.random.random((10,5)))
and I create another dataframe from a subset of my data:
bar = foo.iloc[3:5,1:4]
does bar hold a copy of those elements from foo? Is there any way to create a view of that data instead? If so, what would happen if I try to modify data in this view? Does Pandas provide any sort of copy-on-write mechanism?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Your answer lies in the pandas docs: returning-a-view-versus-a-copy.
Whenever an array of labels or a boolean vector are involved
in the indexing operation, the result will be a copy.
With single label / scalar indexing and slicing,
e.g. df.ix[3:6] or df.ix[:, ‘A’], a view will be returned.
In your example, bar is a view of slices of foo. If you wanted a copy, you could have used the copy method. Modifying bar also modifies foo. pandas does not appear to have a copy-on-write mechanism.
See my code example below to illustrate:
In [1]: import pandas as pd
...: import numpy as np
...: foo = pd.DataFrame(np.random.random((10,5)))
...:
In [2]: pd.__version__
Out[2]: '0.12.0.dev-35312e4'
In [3]: np.__version__
Out[3]: '1.7.1'
In [4]: # DataFrame has copy method
...: foo_copy = foo.copy()
In [5]: bar = foo.iloc[3:5,1:4]
In [6]: bar == foo.iloc[3:5,1:4] == foo_copy.iloc[3:5,1:4]
Out[6]:
1 2 3
3 True True True
4 True True True
In [7]: # Changing the view
...: bar.ix[3,1] = 5
In [8]: # View and DataFrame still equal
...: bar == foo.iloc[3:5,1:4]
Out[8]:
1 2 3
3 True True True
4 True True True
In [9]: # It is now different from a copy of original
...: bar == foo_copy.iloc[3:5,1:4]
Out[9]:
1 2 3
3 False True True
4 True True True
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0