apache-spark Archives - Page 4 of 6

Rename nested field in spark dataframe

August 16, 2022 by Magenaut

Having a dataframe df in Spark:

Shipping Python modules in pyspark to other nodes

August 15, 2022 by Magenaut

How can I ship C compiled modules (for example, python-Levenshtein) to each node in a Spark cluster?

Pyspark: explode json in column to multiple columns

August 15, 2022 by Magenaut

The data looks like this –

Spark iteration time increasing exponentially when using join

August 15, 2022 by Magenaut

I’m quite new to Spark and I’m trying to implement some iterative algorithm for clustering (expectation-maximization) with centroid represented by Markov model. So I need to do iterations and joins.

AttributeError: ‘DataFrame’ object has no attribute ‘map’

August 15, 2022 by Magenaut

I wanted to convert the spark data frame to add using the code below:

Python worker failed to connect back

August 15, 2022 by Magenaut

I’m a newby with Spark and trying to complete a Spark tutorial:
link to tutorial

Create a custom Transformer in PySpark ML

August 15, 2022 by Magenaut

I am new to Spark SQL DataFrames and ML on them (PySpark).
How can I create a custom tokenizer, which for example removes stop words and uses some libraries from nltk? Can I extend the default one?

Create Spark DataFrame. Can not infer schema for type

August 15, 2022 by Magenaut

Could someone help me solve this problem I have with Spark DataFrame?

Filtering a Pyspark DataFrame with SQL-like IN clause

August 15, 2022 by Magenaut

I want to filter a Pyspark DataFrame with a SQL-like IN clause, as in

Spark RDD to DataFrame python

August 15, 2022 by Magenaut

I am trying to convert the Spark RDD to a DataFrame. I have seen the documentation and example where the scheme is passed to
sqlContext.CreateDataFrame(rdd,schema) function.