Pandas data validation with regex on one column

What I want to do is look for a specific pattern. 1 letter, a dash, followed by a year and letter like “A-2012A”. After that, the rest of the column’s value can be anything. I want to confirm this first part. And return a true/false value. Is it possible?

pattern letter-yearletter

String validation on one column with regular expression.

example_column_1

DNA Assay
A-2000X-27
A-2000X-32
A-2000X-45
A-2000X-48
A-2000X-80
truth_value = df['DNA  Assay'].str.match(r'').astype(bool)

Sample, with nothing in the r'' regular expression.

My expected output would be True

example_column_2

DNA Assay
Embryo FTA-Code-ID-2
Embryo FTA-Code-ID-3
Embryo FTA-Code-ID-4
Embryo FTA-Code-ID-5
Embryo FTA-Code-ID-6

My expected output with example_column_2 would be False

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Use a regex:

df['valid'] = df['DNA \ Assay'].str.match(r'[A-Z]-d{4}[A-Z]', case=False)

output:

  DNA  Assay  valid
0  A-2000X-27   True
1  A-2000X-32   True
2  A-2000X-45   True
3  A-2000X-48   True
4  A-2000X-80   True

If you want to validate all values:

df['DNA \ Assay'].str.match(r'[A-Z]-d{4}[A-Z]', case=False).all()

output: True


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x