I’m new to python/pandas and came across a code snippet.
df = df[~df['InvoiceNo'].str.contains('C')]
Would be much obliged if I could know what is the tilde sign’s usage in this context?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
It means bitwise not, inversing boolean mask – Falses to Trues and Trues to Falses.
Sample:
df = pd.DataFrame({'InvoiceNo': ['aaC','ff','lC'],
'a':[1,2,5]})
print (df)
InvoiceNo a
0 aaC 1
1 ff 2
2 lC 5
#check if column contains C
print (df['InvoiceNo'].str.contains('C'))
0 True
1 False
2 True
Name: InvoiceNo, dtype: bool
#inversing mask
print (~df['InvoiceNo'].str.contains('C'))
0 False
1 True
2 False
Name: InvoiceNo, dtype: bool
Filter by boolean indexing:
df = df[~df['InvoiceNo'].str.contains('C')]
print (df)
InvoiceNo a
1 ff 2
So output is all rows of DataFrame, which not contains C in column InvoiceNo.
Method 2
It’s used to invert boolean Series, see pandas-doc.
Method 3
df = df[~df['InvoiceNo'].str.contains('C')]
The above code block denotes that remove all data tuples from pandas dataframe, which has “C” letters in the strings values in [InvoiceNo] column.
tilde(~) sign works as a NOT(!) operator in this scenario.
Generally above statement uses to remove data tuples that have null values from data columns.
Method 4
tilde ~ is a bitwise operator. If the operand is 1, it returns 0, and if 0, it returns 1. So you will get the InvoiceNo values in the df that does not contain the string ‘C’
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0