pyspark to_date fail to infer format

I have a column of type string where the values are of the form ‘Jun 2019’; ‘Sep 2020’; etc.
I am trying to extract the year out of it, but it seems like to_date function fail to convert the data to datetime format

here is the code tried

df = df.withColumn('year_launch', year(to_date(df.launch)))
df.show()

Current outcome

pyspark to_date fail to infer format

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You have to pass the format of the date to the to_date function.

from pyspark.sql import functions as F

df.withColumn('year_launch', F.year(F.to_date("launch",  'MMM yyyy'))).show()

Output:

+--------+-----------+
|  launch|year_launch|
+--------+-----------+
|Jun 2019|       2019|
|Sep 2020|       2020|
|Jun 2021|       2021|
|Oct 2021|       2021|
+--------+-----------+


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x