I have a column of type string where the values are of the form ‘Jun 2019’; ‘Sep 2020’; etc.
I am trying to extract the year out of it, but it seems like to_date function fail to convert the data to datetime format
here is the code tried
df = df.withColumn('year_launch', year(to_date(df.launch)))
df.show()
Current outcome
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You have to pass the format of the date to the to_date function.
from pyspark.sql import functions as F
df.withColumn('year_launch', F.year(F.to_date("launch", 'MMM yyyy'))).show()
Output:
+--------+-----------+ | launch|year_launch| +--------+-----------+ |Jun 2019| 2019| |Sep 2020| 2020| |Jun 2021| 2021| |Oct 2021| 2021| +--------+-----------+
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0
