To initialize Apache Spark (2.0.+) R on R Studio (1.0.+) on Ubuntu OS (14.04+)
SparkR
Objective
- To initialize Apache Spark (2.0.+) R on R Studio (1.0.+) on Ubuntu OS (14.04+)
Environment
- Java 1.8.0_111
- Ubuntu 14.04 LTS
How to install
- Download RStudio Desktop
- https://www.rstudio.com/products/rstudio/download3/
- Load SparkR Package with RStudio
library(SparkR, lib.loc = "/home/[your spark directory]/spark-2.0.1-bin-hadoop2.7/R/lib")
sparkR.session(sparkHome = "/home/[your spark directory]/spark-2.0.1-bin-hadoop2.7/", master = "local[*]", sparkConfig = list(spark.driver.memory = "4g"))
df <- as.DataFrame(faithful)
head(df)
References
- SparkR API에 대한 고찰 (2017.07.20)
- Spark 프로그래밍 패러다임에 SparkR이 다소 잘 들어맞지 않는 특성들을 설명.
- SparkDataFrame(Spark 클러스터내의 복수 머신)과 R의 data.frame(메모리기반; 로컬 머신)은 차이가 있음.
I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting Spark Online Training Hyderabad
답글삭제