Spark.md 2.8 KB
Newer Older
kingreatwill's avatar
spark  
kingreatwill 已提交
1 2 3 4 5 6 7 8 9
http://spark.apache.org/docs/latest/quick-start.html




## linux
```
tar xzf spark-*.tgz

kingreatwill's avatar
kingreatwill 已提交
10
cd spark-2.4.5-bin-hadoop2.7
kingreatwill's avatar
spark  
kingreatwill 已提交
11
```
kingreatwill's avatar
kingreatwill 已提交
12
or source .bash_profile 
kingreatwill's avatar
spark  
kingreatwill 已提交
13 14
在~/.bashrc文件中添加如下内容,并执行$ source ~/.bashrc命令使其生效
```
kingreatwill's avatar
kingreatwill 已提交
15 16
# export HADOOP_HOME=/root/spark-2.4.5-bin-hadoop2.7
export SPARK_HOME=/root/spark-2.4.5-bin-hadoop2.7
kingreatwill's avatar
spark  
kingreatwill 已提交
17
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
kingreatwill's avatar
kingreatwill 已提交
18 19 20 21 22 23 24 25 26 27 28

spark-env.sh

#!/usr/bin/env bash

export  SPARK_MASTER_HOST=192.168.110.216
export  SPARK_LOCAL_IP=192.168.110.216
export  SPARK_MASTER_IP=192.168.110.216
export  SPARK_MASTER_PORT=7077
export  SPARK_WORKER_CORES=1
export  SPARK_WORKER_INSTANCES=1
kingreatwill's avatar
kingreatwill 已提交
29
export  PYSPARK_PYTHON=python3
kingreatwill's avatar
kingreatwill 已提交
30 31 32 33



spark-submit --master spark://192.168.110.216:7077 examples/src/main/python/wordcount.py /root/spark-2.4.5-bin-hadoop2.7/README.md
kingreatwill's avatar
spark  
kingreatwill 已提交
34 35 36 37 38 39 40 41 42 43 44 45 46
```











## windows
kingreatwill's avatar
kingreatwill 已提交
47 48 49 50
[Spark启动时的master参数以及Spark的部署方式](https://blog.csdn.net/zpf336/article/details/82152286)
[spark-submit几种提交模式的区别](https://blog.csdn.net/fa124607857/article/details/103390996)

http://spark.apache.org/docs/latest/quick-start.html
kingreatwill's avatar
spark  
kingreatwill 已提交
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
### 下载安装
```

setx HADOOP_HOME E:\bigdata\spark-2.4.4-bin-hadoop2.7\
setx SPARK_HOME E:\bigdata\spark-2.4.4-bin-hadoop2.7\

%SPARK_HOME%\bin\spark-submit --version








kingreatwill's avatar
kingreatwill 已提交
66
%SPARK_HOME%\bin\run-example SparkPi  # 可选参数10
kingreatwill's avatar
spark  
kingreatwill 已提交
67 68

%SPARK_HOME%\bin\spark-submit examples/src/main/python/pi.py
kingreatwill's avatar
kingreatwill 已提交
69 70 71 72 73 74 75 76 77 78 79

# http://spark.apache.org/docs/latest/submitting-applications.html#master-urls
# spark-submit --class Test --master spark://localhost:7077 /home/data/myjar/Hello.jar

set SPARK_LOCAL_IP=192.168.1.216
set SPARK_MASTER_HOST=192.168.1.216
%SPARK_HOME%\bin\spark-submit --master spark://192.168.110.216:7077 examples/src/main/python/pi.py 10
```

```
bin/spark-submit --master spark://master.hadoop:7077 --class nuc.sw.test.ScalaWordCount spark-1.0-SNAPSHOT.jar hdfs://master.hadoop:9000/spark/input/a.txt hdfs://master.hadoop:9000/spark/output
kingreatwill's avatar
spark  
kingreatwill 已提交
80 81 82
```

### 交互环境
kingreatwill's avatar
kingreatwill 已提交
83
 交互环境的默认UI http://localhost:4040/
kingreatwill's avatar
spark  
kingreatwill 已提交
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
```
cd E:\bigdata\spark-2.4.4-bin-hadoop2.7

# python
%SPARK_HOME%\bin\pyspark

>>> textFile = spark.read.text("README.md")
>>> textFile.count() # Number of rows in this DataFrame
105
>>> textFile.first() # First row in this DataFrame
Row(value='# Apache Spark')
>>> linesWithSpark = textFile.filter(textFile.value.contains("Spark"))
>>> textFile.filter(textFile.value.contains("Spark")).count() # How many lines contain "Spark"?
20

kingreatwill's avatar
kingreatwill 已提交
99 100
>>> sc.parallelize(range(1000)).count() 
1000
kingreatwill's avatar
spark  
kingreatwill 已提交
101 102 103

# Scala
%SPARK_HOME%\bin\spark-shell
kingreatwill's avatar
kingreatwill 已提交
104 105

# PYSPARK_DRIVER_PYTHON设置为ipython后,pyspark交互模式变为ipython模式
kingreatwill's avatar
spark  
kingreatwill 已提交
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
```


### demo
```
# Use spark-submit to run your application
$ YOUR_SPARK_HOME/bin/spark-submit \
  --class "SimpleApp" \
  --master local[4] \
  target/simple-project-1.0.jar


# Use spark-submit to run your application
$ YOUR_SPARK_HOME/bin/spark-submit \
  --master local[4] \
  SimpleApp.py
```