3

6a5a1a7c · wizardforcel · 91bbd9f4 · 6a5a1a7c
隐藏空白更改
内联并排

Showing with 252 addition and 249 deletion

docs/3.md docs/3.md +252 -249

未找到文件。
--- a/docs/3.md
+++ b/docs/3.md
-# 3\. Configure Running Platform
+# 3\. 配置运行平台

-Chinese proverb
+> **工欲善其事，必先利其器。** – 中国古代谚语

-**Good tools are prerequisite to the successful execution of a job.** – old Chinese proverb
+一个好的编程平台可以为您节省大量的麻烦和时间。 在这里，我将仅介绍如何安装我最喜欢的编程平台，并且只展示我在 Linux 系统上设置它的最简单的方法。 如果要在其他操作系统上安装，可以通过搜索引擎。 在本节中，您可以学习如何在相应的编程平台和包上设置 Pyspark。

-A good programming platform can save you lots of troubles and time. Herein I will only present how to install my favorite programming platform and only show the easiest way which I know to set it up on Linux system. If you want to install on the other operator system, you can Google it. In this section, you may learn how to set up Pyspark on the corresponding programming platform and package.
+## 3.1\. 在 Databricks 社区云上运行

-## 3.1\. Run on Databricks Community Cloud
+如果您对 Linux 或 Unix 操作系统没有任何经验，我很乐意建议您在 Databricks 社区云上使用 Spark。 因为你不需要设置 Spark，它对于社区版来说完全是免费的**。 请按照下面列出的步骤操作。

-If you don’t have any experience with Linux or Unix operator system, I would love to recommend you to use Spark on Databricks Community Cloud. Since you do not need to setup the Spark and it’s totally **free** for Community Edition. Please follow the steps listed below.
+1.  在 [https://community.cloud.databricks.com/login.html](https://community.cloud.databricks.com/login.html) 建立账户：

-> 1.  Sign up a account at: [https://community.cloud.databricks.com/login.html](https://community.cloud.databricks.com/login.html)
-> 
-> &gt; ![https://runawayhorse001.github.io/LearningApacheSpark/_images/login.png](img/7166a4887b7f211527c9e45a072e23d2.jpg)
-> 
-> 1.  Sign in with your account, then you can creat your cluster(machine), table(dataset) and notebook(code).
-> 
-> &gt; ![https://runawayhorse001.github.io/LearningApacheSpark/_images/workspace.png](img/c9c3087ea25e6c3f848030b33b06de8f.jpg)
-> 
-> 1.  Create your cluster where your code will run
-> 
-> &gt; ![https://runawayhorse001.github.io/LearningApacheSpark/_images/cluster.png](img/fdfe96b0b4fdfbfd862a698dc64ce34a.jpg)
-> 
-> 1.  Import your dataset
-> 
-> &gt; ![https://runawayhorse001.github.io/LearningApacheSpark/_images/table.png](img/b7721ad6f461509452813013157c7a5e.jpg) ![https://runawayhorse001.github.io/LearningApacheSpark/_images/dataset1.png](img/b8c9ccb17235ad37b2b0fee18853efe6.jpg)
-> 
-> Note
-> 
-> You need to save the path which appears at Uploaded to DBFS: /FileStore/tables/05rmhuqv1489687378010/. Since we will use this path to load the dataset.
+    ![https://runawayhorse001.github.io/LearningApacheSpark/_images/login.png](img/7166a4887b7f211527c9e45a072e23d2.jpg)

-1.  Creat your notebook
+1.  使用您的帐户登录，然后您可以创建集群（计算机），表（数据集）和笔记本（代码）。

-> ![https://runawayhorse001.github.io/LearningApacheSpark/_images/notebook.png](img/edb67528127916e7e274addf9ad96029.jpg) ![https://runawayhorse001.github.io/LearningApacheSpark/_images/codenotebook.png](img/8973b73843e90120de5f556d5084eb49.jpg)
+    ![https://runawayhorse001.github.io/LearningApacheSpark/_images/workspace.png](img/c9c3087ea25e6c3f848030b33b06de8f.jpg)

-After finishing the above 5 steps, you are ready to run your Spark code on Databricks Community Cloud. I will run all the following demos on Databricks Community Cloud. Hopefully, when you run the demo code, you will get the following results:
+1.  创建运行代码的集群

-> ```
-> +---+-----+-----+---------+-----+
-> |_c0|   TV|Radio|Newspaper|Sales|
-> +---+-----+-----+---------+-----+
-> |  1|230.1| 37.8|     69.2| 22.1|
-> |  2| 44.5| 39.3|     45.1| 10.4|
-> |  3| 17.2| 45.9|     69.3|  9.3|
-> |  4|151.5| 41.3|     58.5| 18.5|
-> |  5|180.8| 10.8|     58.4| 12.9|
-> +---+-----+-----+---------+-----+
-> only showing top 5 rows
-> 
-> root
->  |-- _c0: integer (nullable = true)
->  |-- TV: double (nullable = true)
->  |-- Radio: double (nullable = true)
->  |-- Newspaper: double (nullable = true)
->  |-- Sales: double (nullable = true)
-> 
-> ```
+    ![https://runawayhorse001.github.io/LearningApacheSpark/_images/cluster.png](img/fdfe96b0b4fdfbfd862a698dc64ce34a.jpg)

-## 3.2\. Configure Spark on Mac and Ubuntu
+1.  导入你的数据集

-### 3.2.1\. Installing Prerequisites
+    ![https://runawayhorse001.github.io/LearningApacheSpark/_images/table.png](img/b7721ad6f461509452813013157c7a5e.jpg)
+    
+    ![https://runawayhorse001.github.io/LearningApacheSpark/_images/dataset1.png](img/b8c9ccb17235ad37b2b0fee18853efe6.jpg)

-I will strongly recommend you to install [Anaconda](https://www.anaconda.com/download/), since it contains most of the prerequisites and support multiple Operator Systems.
+    > 注意
+    > 
+    > 您需要保存`Uploaded to DBFS`中显示的路径: `/FileStore/tables/05rmhuqv1489687378010/`，由于我们会使用这个路径来上传数据集。

-1.  **Install Python**
+1.  创建你的笔记本

-Go to Ubuntu Software Center and follow the following steps:
+    ![https://runawayhorse001.github.io/LearningApacheSpark/_images/notebook.png](img/edb67528127916e7e274addf9ad96029.jpg) 
+    
+    ![https://runawayhorse001.github.io/LearningApacheSpark/_images/codenotebook.png](img/8973b73843e90120de5f556d5084eb49.jpg)

-> 1.  Open Ubuntu Software Center
-> 2.  Search for python
-> 3.  And click Install
+完成上述 5 个步骤后，您就可以在 Databricks 社区云上运行 Spark 代码了。 我将在 Databricks 社区云上运行以下所有演示。在运行演示代码时，希望您将获得以下结果：

-Or Open your terminal and using the following command:
+```
+---+-----+-----+---------+-----+
+|_c0|   TV|Radio|Newspaper|Sales|
+---+-----+-----+---------+-----+
+|  1|230.1| 37.8|     69.2| 22.1|
+|  2| 44.5| 39.3|     45.1| 10.4|
+|  3| 17.2| 45.9|     69.3|  9.3|
+|  4|151.5| 41.3|     58.5| 18.5|
+|  5|180.8| 10.8|     58.4| 12.9|
+---+-----+-----+---------+-----+
+only showing top 5 rows
+
+root
+ |-- _c0: integer (nullable = true)
+ |-- TV: double (nullable = true)
+ |-- Radio: double (nullable = true)
+ |-- Newspaper: double (nullable = true)
+ |-- Sales: double (nullable = true)

 ```
+
+## 3.2\. 在 Mac 和 Ubuntu 上配置 Spark
+
+### 3.2.1\. 安装先决条件
+
+我强烈建议您安装 [Anaconda](https://www.anaconda.com/download/)，因为它包含大部分先决条件并支持多个操作系统。
+
+**安装 Python**
+
+转到 Ubuntu 软件中心并按照以下步骤操作：
+
+1.  打开 Ubuntu 软件中心
+2.  搜索 python
+3.  并点击“安装”
+
+或者打开终端执行以下命令：
+
+```bash
 sudo apt-get install build-essential checkinstall
 sudo apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev
                 libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev
@@ -83,40 +85,40 @@ sudo pip install ipython

 ```

-### 3.2.2\. Install Java
+### 3.2.2\. 安装 Java

-Java is used by many other softwares. So it is quite possible that you have already installed it. You can by using the following command in Command Prompt:
+Java 被许多其他软件使用。 所以你很可能已经安装了它。 您可以在命令提示符中使用以下命令：

-```
+```bash
 java -version

 ```

-Otherwise, you can follow the steps in [How do I install Java for my Mac?](https://java.com/en/download/help/mac_install.xml) to install java on Mac and use the following command in Command Prompt to install on Ubuntu:
+否则，您可以按照[如何为我的 Mac 安装 Java？](https://java.com/en/download/help/mac_install.xml)中的步骤，在 Mac 上安装 java 并在命令提示符中使用以下命令来在 Ubuntu 上安装：

-```
+```bash
 sudo apt-add-repository ppa:webupd8team/java
 sudo apt-get update
 sudo apt-get install oracle-java8-installer

 ```

-### 3.2.3\. Install Java SE Runtime Environment
+### 3.2.3\. 安装 JRE

-I installed ORACLE [Java JDK](http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html).
+我安装了 ORACLE [Java JDK](http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html)。

-Warning
-
-**Installing Java and Java SE Runtime Environment steps are very important, since Spark is a domain-specific language written in Java.**
+> 警告
+> 
+> **安装 Java 和 Java SE 运行时环境的步骤非常重要，因为 Spark 是一种用 Java 编写的领域特定语言。**

-You can check if your Java is available and find it’s version by using the following command in Command Prompt:
+您可以在命令提示符中使用以下命令检查 Java 是否可用并找到它的版本：

-```
+```bash
 java -version

 ```

-If your Java is installed successfully, you will get the similar results as follows:
+如果您的 Java 安装成功，您将获得如下的类似结果：

 ```
 java version "1.8.0_131"
@@ -125,231 +127,232 @@ Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

 ```

-### 3.2.4\. Install Apache Spark
+### 3.2.4\. 安装 Apache Spark

-Actually, the Pre-build version doesn’t need installation. You can use it when you unpack it.
+实际上，预构建版本不需要安装。 你在解包时可以使用它。

-> 1.  Download: You can get the Pre-built Apache Spark™ from [Download Apache Spark™](http://spark.apache.org/downloads.html).
-> 2.  Unpack: Unpack the Apache Spark™ to the path where you want to install the Spark.
-> 3.  Test: Test the Prerequisites: change the direction `spark-#.#.#-bin-hadoop#.#/bin` and run
-> 
-> ```
-> ./pyspark
-> 
-> ```
-> 
-> ```
-> Python 2.7.13 |Anaconda 4.4.0 (x86_64)| (default, Dec 20 2016, 23:05:08)
-> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
-> Type "help", "copyright", "credits" or "license" for more information.
-> Anaconda is brought to you by Continuum Analytics.
-> Please check out: http://continuum.io/thanks and https://anaconda.org
-> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
-> Setting default log level to "WARN".
-> To adjust logging level use sc.setLogLevel(newLevel). For SparkR,
-> use setLogLevel(newLevel).
-> 17/08/30 13:30:12 WARN NativeCodeLoader: Unable to load native-hadoop
-> library for your platform... using builtin-java classes where applicable
-> 17/08/30 13:30:17 WARN ObjectStore: Failed to get database global_temp,
-> returning NoSuchObjectException
-> Welcome to
->        ____              __
->       / __/__  ___ _____/ /__
->      _\ \/ _ \/ _ `/ __/  '_/
->     /__ / .__/\_,_/_/ /_/\_\   version 2.1.1
->        /_/
-> 
-> Using Python version 2.7.13 (default, Dec 20 2016 23:05:08)
-> SparkSession available as 'spark'.
-> 
-> ```
+1.  下载：您可以从 [下载 Apache Spark™](http://spark.apache.org/downloads.html) 获得预构建的 Apache Spark™。
+2.  解压缩：将 Apache Spark™ 解压缩到您要安装 Spark 的路径。
+3.  测试：测试先决条件：修改路径`spark-#.#.#-bin-hadoop#.#/bin`并运行

-### 3.2.5\. Configure the Spark
+```bash
+./pyspark

-> 1.  **Mac Operator System:** open your `bash_profile` in Terminal
-> 
-> ```
-> vim ~/.bash_profile
-> 
-> ```
-> 
-> And add the following lines to your `bash_profile` (remember to change the path)
-> 
-> ```
-> # add for spark
-> export SPARK_HOME=your_spark_installation_path
-> export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
-> export PATH=$PATH:$SPARK_HOME/bin
-> export PYSPARK_DRIVE_PYTHON="jupyter"
-> export PYSPARK_DRIVE_PYTHON_OPTS="notebook"
-> 
-> ```
-> 
-> At last, remember to source your `bash_profile`
-> 
-> ```
-> source ~/.bash_profile
-> 
-> ```
-> 
-> 1.  **Ubuntu Operator Sysytem:** open your `bashrc` in Terminal
-> 
-> ```
-> vim ~/.bashrc
-> 
-> ```
-> 
-> And add the following lines to your `bashrc` (remember to change the path)
-> 
-> ```
-> # add for spark
-> export SPARK_HOME=your_spark_installation_path
-> export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
-> export PATH=$PATH:$SPARK_HOME/bin
-> export PYSPARK_DRIVE_PYTHON="jupyter"
-> export PYSPARK_DRIVE_PYTHON_OPTS="notebook"
-> 
-> ```
-> 
-> At last, remember to source your `bashrc`
-> 
-> ```
-> source ~/.bashrc
-> 
-> ```
+```

-## 3.3\. Configure Spark on Windows
+```
+Python 2.7.13 |Anaconda 4.4.0 (x86_64)| (default, Dec 20 2016, 23:05:08)
+[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
+Type "help", "copyright", "credits" or "license" for more information.
+Anaconda is brought to you by Continuum Analytics.
+Please check out: http://continuum.io/thanks and https://anaconda.org
+Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
+Setting default log level to "WARN".
+To adjust logging level use sc.setLogLevel(newLevel). For SparkR,
+use setLogLevel(newLevel).
+17/08/30 13:30:12 WARN NativeCodeLoader: Unable to load native-hadoop
+library for your platform... using builtin-java classes where applicable
+17/08/30 13:30:17 WARN ObjectStore: Failed to get database global_temp,
+returning NoSuchObjectException
+Welcome to
+       ____              __
+      / __/__  ___ _____/ /__
+     _\ \/ _ \/ _ `/ __/  '_/
+    /__ / .__/\_,_/_/ /_/\_\   version 2.1.1
+       /_/

-Installing open source software on Windows is always a nightmare for me. Thanks for Deelesh Mandloi. You can follow the detailed procedures in the blog [Getting Started with PySpark on Windows](http://deelesh.github.io/pyspark-windows.html) to install the Apache Spark™ on your Windows Operator System.
+Using Python version 2.7.13 (default, Dec 20 2016 23:05:08)
+SparkSession available as 'spark'.

-## 3.4\. PySpark With Text Editor or IDE
+```

-### 3.4.1\. PySpark With Jupyter Notebook
+### 3.2.5\. 配置 Spark

-After you finishing the above setup steps in [Configure Spark on Mac and Ubuntu](#set-up-ubuntu), then you should be good to write and run your PySpark Code in Jupyter notebook.
+1.  **Mac 操作系统：**在终端打开你的`bash_profile`
+
+    ```bash
+    vim ~/.bash_profile
+
+    ```
+
+    并将以下行添加到`bash_profile`（记得改变路径）
+
+    ```bash
+    # 为 spark 添加
+    export SPARK_HOME=your_spark_installation_path
+    export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
+    export PATH=$PATH:$SPARK_HOME/bin
+    export PYSPARK_DRIVE_PYTHON="jupyter"
+    export PYSPARK_DRIVE_PYTHON_OPTS="notebook"
+
+    ```
+
+    最后，记得执行你的`bash_profile`
+
+    ```bash
+    source ~/.bash_profile
+
+    ```
+
+1.  **Ubuntu 操作系统：**在终端打开`bashrc`
+
+    ```bash
+    vim ~/.bashrc
+
+    ```
+
+    并将以下行添加到`bashrc`（记得改变路径）
+
+    ```bash
+    # 为 spark 添加
+    export SPARK_HOME=your_spark_installation_path
+    export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
+    export PATH=$PATH:$SPARK_HOME/bin
+    export PYSPARK_DRIVE_PYTHON="jupyter"
+    export PYSPARK_DRIVE_PYTHON_OPTS="notebook"
+
+    ```
+
+    最后，记得执行你的`bash_profile`
+
+    ```bash
+    source ~/.bashrc
+
+    ```
+
+## 3.3\. 在 Windows 上配置 Spark
+
+在 Windows 上安装开源软件对我来说总是一场噩梦。 感谢 Deelesh Mandloi。 您可以按照博客[ Windows 上的 PySpark 入门](http://deelesh.github.io/pyspark-windows.html)中的详细步骤，在 Windows 操作系统上安装 Apache Spark™。
+
+## 3.4\. PySpark 和文本编辑器或 IDE 
+
+### 3.4.1\. PySpark 和 Jupyter 笔记本
+
+完成[在 Mac 和 Ubuntu 上配置 Spark](#setup-up-ubuntu)中的上述设置步骤后，您应该在 Jupyter 笔 记本中编写和运行 PySpark 代码。

 > ![https://runawayhorse001.github.io/LearningApacheSpark/_images/jupyterWithPySpark.png](img/90a1240e7489f989b9a4e5739b1efbd5.jpg)

-### 3.4.2\. PySpark With Apache Zeppelin
+### 3.4.2\. PySpark 和 Apache Zeppelin

-After you finishing the above setup steps in [Configure Spark on Mac and Ubuntu](#set-up-ubuntu), then you should be good to write and run your PySpark Code in Apache Zeppelin.
+完成[在 Mac 和 Ubuntu 上配置 Spark](#setup-up-ubuntu)中的上述设置步骤后，您应该在 Apache Zeppelin 中编写和运行 PySpark 代码。

 > ![https://runawayhorse001.github.io/LearningApacheSpark/_images/zeppelin.png](img/067197a5eeb69cc2f3d828a92ebcf52e.jpg)

-### 3.4.3\. PySpark With Sublime Text
+### 3.4.3\. PySpark 和 Sublime Text

-After you finishing the above setup steps in [Configure Spark on Mac and Ubuntu](#set-up-ubuntu), then you should be good to use Sublime Text to write your PySpark Code and run your code as a normal python code in Terminal.
+完成[在 Mac 和 Ubuntu 上配置 Spark](#setup-up-ubuntu)中的上述设置步骤后，您应该可以使用 Sublime Text 编写 PySpark 代码,并在终端中将代码作为普通的 python 代码运行。

-> ```
-> python test_pyspark.py
-> 
-> ```
+```bash
+python test_pyspark.py
+
+```

-Then you should get the output results in your terminal.
+然后你应该在你的终端获得输出结果。

 > ![https://runawayhorse001.github.io/LearningApacheSpark/_images/sublimeWithPySpark.png](img/c51fb942d508d4161e72d0075a5284e7.jpg)

-### 3.4.4\. PySpark With Eclipse
+### 3.4.4\. PySpark 和 Eclipse

-If you want to run PySpark code on Eclipse, you need to add the paths for the **External Libraries** for your **Current Project** as follows:
+如果要在 Eclipse 上运行 PySpark 代码，则需要为**当前项目**添加**外部库**的路径，如下所示：

-> 1.  Open the properties of your project
-> 
-> &gt; ![https://runawayhorse001.github.io/LearningApacheSpark/_images/PyDevProperties.png](img/f18ecec7a6c176301d7370e41a0a60dd.jpg)
-> 
-> 1.  Add the paths for the **External Libraries**
-> 
-> &gt; ![https://runawayhorse001.github.io/LearningApacheSpark/_images/pydevPath.png](img/197517339d2ce744dd0a46c607e84534.jpg)
+1.  打开你的项目的属性

-And then you should be good to run your code on Eclipse with PyDev.
+    ![https://runawayhorse001.github.io/LearningApacheSpark/_images/PyDevProperties.png](img/f18ecec7a6c176301d7370e41a0a60dd.jpg)
+
+1.  为**外部**添加路径
+
+    ![https://runawayhorse001.github.io/LearningApacheSpark/_images/pydevPath.png](img/197517339d2ce744dd0a46c607e84534.jpg)
+
+然后你应该足以用 PyDev 在 Eclipse 上运行你的代码。

 > ![https://runawayhorse001.github.io/LearningApacheSpark/_images/pysparkWithEclipse.png](img/6f2adb68d3f0a7f1f3af2ef044441071.jpg)

-## 3.5\. PySparkling Water: Spark + H2O
+## 3.5\. PySparkling 水: Spark + H2O

-1.  Download `Sparkling Water` from: [https://s3.amazonaws.com/h2o-release/sparkling-water/rel-2.4/5/index.html](https://s3.amazonaws.com/h2o-release/sparkling-water/rel-2.4/5/index.html)
-2.  Test PySparking
+1.  从 [https://s3.amazonaws.com/h2o-release/sparkling-water/rel-2.4/5/index.html](https://s3.amazonaws.com/h2o-release/sparkling-water/rel-2.4/5/index.html) 下载`Sparkling Water`：

-```
-unzip sparkling-water-2.4.5.zip
-cd  ~/sparkling-water-2.4.5/bin
-./pysparkling
+2.  测试 PySparking

-```
+    ```bash
+    unzip sparkling-water-2.4.5.zip
+    cd  ~/sparkling-water-2.4.5/bin
+    ./pysparkling

-If you have a correct setup for PySpark, then you will get the following results:
+    ```

-```
-Using Spark defined in the SPARK_HOME=/Users/dt216661/spark environmental property
+    如果您有正确设置了 PySpark，那么您将获得以下结果：

-Python 3.7.1 (default, Dec 14 2018, 13:28:58)
-[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
-Type "help", "copyright", "credits" or "license" for more information.
-2019-02-15 14:08:30 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-Setting default log level to "WARN".
-Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
-Setting default log level to "WARN".
-To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
-2019-02-15 14:08:31 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4040\. Attempting port 4041.
-2019-02-15 14:08:31 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4041\. Attempting port 4042.
-17/08/30 13:30:12 WARN NativeCodeLoader: Unable to load native-hadoop
-library for your platform... using builtin-java classes where applicable
-17/08/30 13:30:17 WARN ObjectStore: Failed to get database global_temp,
-returning NoSuchObjectException
-Welcome to
-       ____              __
-      / __/__  ___ _____/ /__
-     _\ \/ _ \/ _ `/ __/  '_/
-    /__ / .__/\_,_/_/ /_/\_\   version 2.4.0
-       /_/
+    ```
+    Using Spark defined in the SPARK_HOME=/Users/dt216661/spark environmental property

-Using Python version 3.7.1 (default, Dec 14 2018 13:28:58)
-SparkSession available as 'spark'.
+    Python 3.7.1 (default, Dec 14 2018, 13:28:58)
+    [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
+    Type "help", "copyright", "credits" or "license" for more information.
+    2019-02-15 14:08:30 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+    Setting default log level to "WARN".
+    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
+    Setting default log level to "WARN".
+    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
+    2019-02-15 14:08:31 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4040\. Attempting port 4041.
+    2019-02-15 14:08:31 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4041\. Attempting port 4042.
+    17/08/30 13:30:12 WARN NativeCodeLoader: Unable to load native-hadoop
+    library for your platform... using builtin-java classes where applicable
+    17/08/30 13:30:17 WARN ObjectStore: Failed to get database global_temp,
+    returning NoSuchObjectException
+    Welcome to
+           ____              __
+          / __/__  ___ _____/ /__
+         _\ \/ _ \/ _ `/ __/  '_/
+        /__ / .__/\_,_/_/ /_/\_\   version 2.4.0
+           /_/

-```
+    Using Python version 3.7.1 (default, Dec 14 2018 13:28:58)
+    SparkSession available as 'spark'.

-1.  Setup `pysparkling` with Jupyter notebook
+    ```

-Add the following alias to your `bashrc` (Linux systems) or `bash_profile` (Mac system)
+1.  使用 Jupyter notebook `pysparkling`

-```
-alias sparkling="PYSPARK_DRIVER_PYTHON="ipython" PYSPARK_DRIVER_PYTHON_OPTS=    "notebook" /~/~/sparkling-water-2.4.5/bin/pysparkling"
+    将以下别名添加到`bashrc`（Linux 系统）或`bash_profile`（Mac 系统）

-```
+    ```bash
+    alias sparkling="PYSPARK_DRIVER_PYTHON="ipython" PYSPARK_DRIVER_PYTHON_OPTS=    "notebook" /~/~/sparkling-water-2.4.5/bin/pysparkling"

-1.  Open `pysparkling` in terminal
+    ```

-```
-sparkling
+1.  在终端打开`pysparkling`

-```
+    ```bash
+    sparkling

-## 3.6\. Set up Spark on Cloud
+    ```

-Following the setup steps in [Configure Spark on Mac and Ubuntu](#set-up-ubuntu), you can set up your own cluster on the cloud, for example AWS, Google Cloud. Actually, for those clouds, they have their own Big Data tool. Yon can run them directly whitout any setting just like Databricks Community Cloud. If you want more details, please feel free to contact with me.
+## 3.6\. 在云上配置 Spark

-## 3.7\. Demo Code in this Section
+按照[在 Mac 和 Ubuntu 上配置 Spark](#setup-up-ubuntu)中的设置步骤，您可以在云上设置自己的集群，例如 AWS，Google Cloud。 实际上，对于那些云，他们有自己的大数据工具。 你可以直接在任何设置上运行它们，就像 Databricks 社区云一样。 如果您想了解更多详情，请随时与作者联系。

-The code for this section is available for download [test_pyspark](static/test_pyspark.py), and the Jupyter notebook can be download from [test_pyspark_ipynb](static/test_pyspark.ipynb).
+## 3.7\. 这一节的示例代码

-*   Python Source code
+此部分的代码可在[`test_pyspark`](static/test_pyspark.py)下载，Jupyter 笔记本可从[`test_pyspark_ipynb`](static/test_pyspark.ipynb)下载。

-> ```
-> ## set up  SparkSession
-> from pyspark.sql import SparkSession
-> 
-> spark = SparkSession \
->     .builder \
->     .appName("Python Spark SQL basic example") \
->     .config("spark.some.config.option", "some-value") \
->     .getOrCreate()
-> 
-> df = spark.read.format('com.databricks.spark.csv').\
->                                options(header='true', \
->                                inferschema='true').\
->                      load("/home/feng/Spark/Code/data/Advertising.csv",header=True)
-> 
-> df.show(5)
-> df.printSchema()                     
-> 
-> ```
\ No newline at end of file
+*   Python 源代码
+
+```py
+## 建立 SparkSession
+from pyspark.sql import SparkSession
+
+spark = SparkSession \
+    .builder \
+    .appName("Python Spark SQL basic example") \
+    .config("spark.some.config.option", "some-value") \
+    .getOrCreate()
+
+df = spark.read.format('com.databricks.spark.csv').\
+                               options(header='true', \
+                               inferschema='true').\
+                     load("/home/feng/Spark/Code/data/Advertising.csv",header=True)
+
+df.show(5)
+df.printSchema()                     
+
+```
\ No newline at end of file