Spark.md 3.7 KB
Newer Older
kingreatwill's avatar
spark  
kingreatwill 已提交
1

kingreatwill's avatar
kingreatwill 已提交
2 3 4 5 6 7 8 9 10 11
## 2.Spark邮件列表
2.1 邮件列表清单
如果想要进一步跟踪问题、获取最新资源、调试bug、或者贡献代码给Spark项目组,邮件列表是一个非常好的方式。邮件列表也是有多种方式,需要区分每一个邮件类型,订阅你关心的邮件。Apache下面的每一个项目都有自己的邮件列表,同时分不同的邮件组,Apache Spark有如下订阅列表

user@spark.apache.org  订阅该邮件可以参与讨论普通用户遇到的问题
dev-subscribe@spark.apache.org   订阅该邮件可以参与讨论开发者遇到的问题,开发者比较常用这个邮件列表
issues-subscribe@spark.apache.org 订阅该邮件可以收到所有jira的创建和更新
commits-subscribe@spark.apache.org 所有的代码的提交变动信息都会发到该邮件

给上列邮箱发送邮件
kingreatwill's avatar
spark  
kingreatwill 已提交
12 13 14 15 16 17 18



## linux
```
tar xzf spark-*.tgz

kingreatwill's avatar
kingreatwill 已提交
19
cd spark-2.4.5-bin-hadoop2.7
kingreatwill's avatar
spark  
kingreatwill 已提交
20
```
kingreatwill's avatar
kingreatwill 已提交
21
or source .bash_profile 
kingreatwill's avatar
spark  
kingreatwill 已提交
22 23
在~/.bashrc文件中添加如下内容,并执行$ source ~/.bashrc命令使其生效
```
kingreatwill's avatar
kingreatwill 已提交
24 25
# export HADOOP_HOME=/root/spark-2.4.5-bin-hadoop2.7
export SPARK_HOME=/root/spark-2.4.5-bin-hadoop2.7
kingreatwill's avatar
spark  
kingreatwill 已提交
26
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
kingreatwill's avatar
kingreatwill 已提交
27 28 29 30 31 32 33 34 35 36 37

spark-env.sh

#!/usr/bin/env bash

export  SPARK_MASTER_HOST=192.168.110.216
export  SPARK_LOCAL_IP=192.168.110.216
export  SPARK_MASTER_IP=192.168.110.216
export  SPARK_MASTER_PORT=7077
export  SPARK_WORKER_CORES=1
export  SPARK_WORKER_INSTANCES=1
kingreatwill's avatar
kingreatwill 已提交
38
export  PYSPARK_PYTHON=python3
kingreatwill's avatar
kingreatwill 已提交
39 40 41 42



spark-submit --master spark://192.168.110.216:7077 examples/src/main/python/wordcount.py /root/spark-2.4.5-bin-hadoop2.7/README.md
kingreatwill's avatar
spark  
kingreatwill 已提交
43 44 45 46 47 48 49 50 51 52 53 54 55
```











## windows
kingreatwill's avatar
kingreatwill 已提交
56 57 58 59
[Spark启动时的master参数以及Spark的部署方式](https://blog.csdn.net/zpf336/article/details/82152286)
[spark-submit几种提交模式的区别](https://blog.csdn.net/fa124607857/article/details/103390996)

http://spark.apache.org/docs/latest/quick-start.html
kingreatwill's avatar
spark  
kingreatwill 已提交
60 61 62
### 下载安装
```

63 64 65 66 67 68
setx HADOOP_HOME  E:\bigdata\hadoop-3.2.1\
setx SPARK_HOME E:\bigdata\spark-3.0.0-bin-hadoop3.2\

https://github.com/cdarlint/winutils
下载winutils.exe放入 E:\bigdata\hadoop-3.2.1\bin中
path 添加%SPARK_HOME%\bin
kingreatwill's avatar
spark  
kingreatwill 已提交
69 70 71 72 73 74 75 76 77 78

%SPARK_HOME%\bin\spark-submit --version








kingreatwill's avatar
kingreatwill 已提交
79
%SPARK_HOME%\bin\run-example SparkPi  # 可选参数10
kingreatwill's avatar
spark  
kingreatwill 已提交
80 81

%SPARK_HOME%\bin\spark-submit examples/src/main/python/pi.py
kingreatwill's avatar
kingreatwill 已提交
82 83 84 85 86 87 88 89 90 91 92

# http://spark.apache.org/docs/latest/submitting-applications.html#master-urls
# spark-submit --class Test --master spark://localhost:7077 /home/data/myjar/Hello.jar

set SPARK_LOCAL_IP=192.168.1.216
set SPARK_MASTER_HOST=192.168.1.216
%SPARK_HOME%\bin\spark-submit --master spark://192.168.110.216:7077 examples/src/main/python/pi.py 10
```

```
bin/spark-submit --master spark://master.hadoop:7077 --class nuc.sw.test.ScalaWordCount spark-1.0-SNAPSHOT.jar hdfs://master.hadoop:9000/spark/input/a.txt hdfs://master.hadoop:9000/spark/output
kingreatwill's avatar
spark  
kingreatwill 已提交
93 94 95
```

### 交互环境
kingreatwill's avatar
kingreatwill 已提交
96
 交互环境的默认UI http://localhost:4040/
kingreatwill's avatar
spark  
kingreatwill 已提交
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
```
cd E:\bigdata\spark-2.4.4-bin-hadoop2.7

# python
%SPARK_HOME%\bin\pyspark

>>> textFile = spark.read.text("README.md")
>>> textFile.count() # Number of rows in this DataFrame
105
>>> textFile.first() # First row in this DataFrame
Row(value='# Apache Spark')
>>> linesWithSpark = textFile.filter(textFile.value.contains("Spark"))
>>> textFile.filter(textFile.value.contains("Spark")).count() # How many lines contain "Spark"?
20

kingreatwill's avatar
kingreatwill 已提交
112 113
>>> sc.parallelize(range(1000)).count() 
1000
kingreatwill's avatar
spark  
kingreatwill 已提交
114 115 116

# Scala
%SPARK_HOME%\bin\spark-shell
kingreatwill's avatar
kingreatwill 已提交
117 118

# PYSPARK_DRIVER_PYTHON设置为ipython后,pyspark交互模式变为ipython模式
kingreatwill's avatar
spark  
kingreatwill 已提交
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
```


### demo
```
# Use spark-submit to run your application
$ YOUR_SPARK_HOME/bin/spark-submit \
  --class "SimpleApp" \
  --master local[4] \
  target/simple-project-1.0.jar


# Use spark-submit to run your application
$ YOUR_SPARK_HOME/bin/spark-submit \
  --master local[4] \
  SimpleApp.py
```