Skip to content

Magenaut

  • Home
  • Topics
    • Notes
    • Tutorial
    • Bug fixing
    • Extension
    • Server
  • Q&A
  • Privacy Policy
  • About

rdd

How to find median and quantiles using Spark

August 20, 2022 by Magenaut

How can I find median of an RDD of integers using a distributed method, IPython, and Spark? The RDD is approximately 700,000 elements and therefore too large to collect and find the median.

Categories Python, Q&A Tags apache-spark, median, pyspark, python, rdd Leave a comment

‘PipelinedRDD’ object has no attribute ‘toDF’ in PySpark

August 17, 2022 by Magenaut

I’m trying to load an SVM file and convert it to a DataFrame so I can use the ML module (Pipeline ML) from Spark.
I’ve just installed a fresh Spark 1.5.0 on an Ubuntu 14.04 (no spark-env.sh configured).

Categories Python, Q&A Tags apache-spark, apache-spark-sql, pyspark, python, rdd Leave a comment

PySpark DataFrames – way to enumerate without converting to Pandas?

August 14, 2022 by Magenaut

I have a very big pyspark.sql.dataframe.DataFrame named df.
I need some way of enumerating records- thus, being able to access record with certain index. (or select group of records with indexes range)

Categories Python, Q&A Tags apache-spark, bigdata, pyspark, python, rdd Leave a comment

Spark union of multiple RDDs

August 14, 2022 by Magenaut

In my pig code I do this:

Categories Python, Q&A Tags apache-spark, pyspark, python, rdd Leave a comment

Reduce a key-value pair into a key-list pair with Apache Spark

August 13, 2022 by Magenaut

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ..., Vn]). I feel like I should be able to do this using the reduceByKey function with something of the flavor:

Categories Python, Q&A Tags apache-spark, mapreduce, pyspark, python, rdd Leave a comment
  1. michealSmith07 on Is there a way to dynamically refresh the less command?August 21, 2022

    That is a very nice post. I like this post.

  2. anonymous on Fix libwacom9 dependency issue when upgrade DebianJune 27, 2022

    saved my day!! Thanks for the help…

  3. sreedhar on Fix libwacom9 dependency issue when upgrade DebianMay 10, 2022

    Thanks its working

  4. saintnick on Fix libwacom9 dependency issue when upgrade DebianMay 10, 2022

    remove libwacom2 worked for me as well

  5. ranafoul on Fix libwacom9 dependency issue when upgrade DebianApril 22, 2022

    apt remove libwacom2 helped on kali 2022.01. gr8

.net ajax asp.net asp.net-core asp.net-mvc asp.net-mvc-3 asp.net-mvc-4 asp.net-web-api bash c# command-line css custom-post-types custom-taxonomy dataframe dictionary django entity-framework functions gridview html iis javascript jquery json linux list matplotlib numpy pandas php plugin-development plugins posts python python-2.7 python-3.x security shell shell-script sql string vb.net webforms wp-query

© 2026 Magenaut • Built with GeneratePress