pyspark Archives - Page 2 of 6

Spark Dataframe distinguish columns with duplicated name

August 18, 2022 by Magenaut

So as I know in Spark Dataframe, that for multiple columns can have the same name as shown in below dataframe snapshot:

How to change a dataframe column from String type to Double type in PySpark?

August 18, 2022 by Magenaut

I have a dataframe with column as String.
I wanted to change the column type to Double type in PySpark.

How to change dataframe column names in pyspark?

August 18, 2022 by Magenaut

I come from pandas background and am used to reading data from CSV files into a dataframe and then simply changing the column names to something useful using the simple command:

How to perform union on two DataFrames with different amounts of columns in spark?

August 17, 2022 by Magenaut

I have 2 DataFrames:

How to turn off INFO logging in Spark?

August 17, 2022 by Magenaut

I installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark script to get to the spark prompt and can also do the Quick Start quide successfully.

Pyspark: Split multiple array columns into rows

August 17, 2022 by Magenaut

I have a dataframe which has one row, and several columns. Some of the columns are single values, and others are lists. All list columns are the same length. I want to split each list column into a separate row, while keeping any non-list column as is.

Spark Dataframe distinguish columns with duplicated name

How to change a dataframe column from String type to Double type in PySpark?

How to change dataframe column names in pyspark?

How to perform union on two DataFrames with different amounts of columns in spark?

How to turn off INFO logging in Spark?

Pyspark: Split multiple array columns into rows

creating spark data structure from multiline record

How do I add a new column to a Spark DataFrame (using PySpark)?

‘PipelinedRDD’ object has no attribute ‘toDF’ in PySpark

Count number of non-NaN entries in each column of Spark dataframe with Pyspark