What I want to do is look for a specific pattern. 1 letter, a dash, followed by a year and letter like “A-2012A”. After that, the rest of the column’s value can be anything. I want to confirm this first part. And return a true/false value. Is it possible?
pattern letter-yearletter
String validation on one column with regular expression.
example_column_1
| DNA Assay |
|---|
| A-2000X-27 |
| A-2000X-32 |
| A-2000X-45 |
| A-2000X-48 |
| A-2000X-80 |
truth_value = df['DNA Assay'].str.match(r'').astype(bool)
Sample, with nothing in the r'' regular expression.
My expected output would be True
example_column_2
| DNA Assay |
|---|
| Embryo FTA-Code-ID-2 |
| Embryo FTA-Code-ID-3 |
| Embryo FTA-Code-ID-4 |
| Embryo FTA-Code-ID-5 |
| Embryo FTA-Code-ID-6 |
My expected output with example_column_2 would be False
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Use a regex:
df['valid'] = df['DNA \ Assay'].str.match(r'[A-Z]-d{4}[A-Z]', case=False)
output:
DNA Assay valid 0 A-2000X-27 True 1 A-2000X-32 True 2 A-2000X-45 True 3 A-2000X-48 True 4 A-2000X-80 True
If you want to validate all values:
df['DNA \ Assay'].str.match(r'[A-Z]-d{4}[A-Z]', case=False).all()
output: True
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0