GroupBy column and filter rows with maximum value in Pyspark

I am almost certain this has been asked before, but a search through stackoverflow did not answer my question. Not a duplicate of [2] since I want the maximum value, not the most frequent item. I am new to pyspark and trying to do something really simple: I want to groupBy column “A” and then only keep the row of each group that has the maximum value in column “B”. Like this:

How to determine if object is a valid key-value pair in PySpark

If I have a rdd, how do I understand the data is in key:value format? is there a way to find the same – something like type(object) tells me an object’s type. I tried print type(rdd.take(1)), but it just says <type ‘list’>. Let’s say I have a data like (x,1),(x,2),(y,1),(y,3) and I use groupByKey and … Read more