README.md 7.5 KB
Newer Older
1
# Stratosphere
2

R
Robert Metzger 已提交
3
Big Data looks tiny from Stratosphere.
4

R
Robert Metzger 已提交
5 6 7 8
## Start writing a Stratosphere Job
If you just want to get started with Stratosphere, use the following command to set up an empty Stratosphere Job

```
9
curl https://raw.github.com/stratosphere/stratosphere-quickstart/master/quickstart.sh | bash
R
Robert Metzger 已提交
10 11 12 13 14
```
The quickstart sample contains everything to develop a Stratosphere Job on your computer. No setup needed.


## Build Stratosphere
K
ktzoumas 已提交
15
Below are three short tutorials that guide you through the first steps: Building, running and developing.
16 17 18

###  Build From Source

K
ktzoumas 已提交
19
This tutorial shows how to build Stratosphere on your own system. Please open a bug report if you have any troubles!
20 21

#### Requirements
R
Robert Metzger 已提交
22
* Unix-like environment (We use Linux, Mac OS X, Cygwin)
23
* git
R
Robert Metzger 已提交
24
* Maven (at least version 3.0.4)
25 26
* Java 6 or 7

R
Robert Metzger 已提交
27
```
28
git clone https://github.com/stratosphere/stratosphere.git
29
cd stratosphere
R
Robert Metzger 已提交
30 31
mvn -DskipTests clean package # this will take up to 5 minutes
```
32

K
ktzoumas 已提交
33 34
Stratosphere is now installed in `stratosphere-dist/target`
If you’re a Debian/Ubuntu user, you’ll find a .deb package. We will continue with the generic case.
35

36
	cd stratosphere-dist/target/stratosphere-dist-0.4-SNAPSHOT-bin/stratosphere-0.4-SNAPSHOT/
37

38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
The directory structure here looks like the contents of the official release distribution.

#### Build for different Hadoop Versions
This section is for advanced users that want to build Stratosphere for a different Hadoop version, for example for Hadoop Yarn support.

We use the profile activation via properties (-D).

##### Build hadoop v1 (default)
Build the default (currently hadoop 1.2.1)
```mvn clean package```

Build for a specific hadoop v1 version
```mvn -Dhadoop-one.version=1.1.2 clean package```

##### Build hadoop v2 (yarn)

Build the yarn using the default version defined in the pom
```mvn -Dhadoop.profile=2 clean package```

Build for a specific hadoop v1 version
```mvn -Dhadoop.profile=2 -Dhadoop-two.version=2.1.0-beta clean package```

It is necessary to generate separate POMs if you want to deploy to your local repository (`mvn install`) or somewhere else.
We have a script in `/tools` that generates POMs for the profiles. Use 
```mvn -f pom.hadoop2.xml clean install -DskipTests```
to put a POM file with the right dependencies into your local repository.

65 66 67

### Run your first program

K
ktzoumas 已提交
68 69
We will run a simple “Word Count” example. 
The easiest way to start Stratosphere on your local machine is so-called "local-mode":
70 71 72 73 74 75 76 77 78

	./bin/start-local.sh

Get some test data:

	 wget -O hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt

Start the job:

79
	./bin/pact-client.sh run --jarfile ./examples/pact/pact-examples-0.4-SNAPSHOT-WordCount.jar --arguments 1 file://`pwd`/hamlet.txt file://`pwd`/wordcount-result.txt
80

K
ktzoumas 已提交
81
You will find a file called `wordcount-result.txt` in your current directory.
82

K
ktzoumas 已提交
83 84
#### Alternative Method: Use the PACT web interface
(And get a nice execution plan overview for free!)
85 86 87 88 89 90 91

	./bin/start-local.sh
	./bin/pact-webfrontend.sh start

Get some test data:
	 wget -O ~/hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt

92
* Point your browser to to http://localhost:8080/launch.html. Upload the WordCount.jar using the upload form in the lower right box. The jar is located in `./examples/pact/pact-examples-0.4-WordCount.jar`.
93 94 95 96 97 98 99 100
* Select the WordCount jar from the list of available jars (upper left).
* Enter the argument line in the lower-left box: `1 file://<path to>/hamlet.txt file://<wherever you want the>/wordcount-result.txt`

* Hit “Run Job”


### Eclipse Setup and Debugging

K
ktzoumas 已提交
101
To contribute back to the project or develop your own jobs for Stratosphere, you need a working development environment. We use Eclipse and IntelliJ for development. Here we focus on Eclipse.
102

A
Aljoscha Krettek 已提交
103
If you want to work on the scala code you will need the following plugins:
104 105

Eclipse 4.x:
A
Aljoscha Krettek 已提交
106 107 108 109
  * scala-ide: http://download.scala-ide.org/sdk/e38/scala210/stable/site
  * m2eclipse-scala: http://alchim31.free.fr/m2e-scala/update-site
  * build-helper-maven-plugin: https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.15.0/N/0.15.0.201206251206/

110 111 112 113 114
Eclipse 3.7:
  * scala-ide: http://download.scala-ide.org/sdk/e37/scala210/stable/site
  * m2eclipse-scala: http://alchim31.free.fr/m2e-scala/update-site
  * build-helper-maven-plugin: https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/

A
Aljoscha Krettek 已提交
115
When you don't have the plugins your project will have build errors, you can just close the scala projects and ignore them.
116
o
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
Import the Stratosphere source code using Maven's Import tool:
  * Select "Import" from the "File"-menu.
  * Expand "Maven" node, select "Existing Maven Projects", and click "next" button
  * Select the root directory by clicking on the "Browse" button and navigate to the top folder of the cloned Stratosphere Git repository.
  * Ensure that all projects are selected and click the "Finish" button.

Create a new Eclipse Project that requires Stratosphere in its Build Path!

Use this skeleton as an entry point for your own Jobs: It allows you to hit the “Run as” -> “Java Application” feature of Eclipse. (You have to stop the application manually, because only one instance can run at a time)

```java
public class Tutorial implements PlanAssembler, PlanAssemblerDescription {

	public static void execute(Plan toExecute) throws Exception {
		LocalExecutor executor = new LocalExecutor();
		executor.start();
		long runtime = executor.executePlan(toExecute);
		System.out.println("runtime:  " + runtime);
		executor.stop();
	}

	@Override
	public Plan getPlan(String... args) {
		// your Plan goes here
	}

	@Override
	public String getDescription() {
		return "Usage: …. "; // TODO
	}

	public static void main(String[] args) throws Exception {
		Tutorial tut = new Tutorial();
		Plan toExecute = tut.getPlan( /* Arguments */);
		execute(toExecute);
	}
}

```

## Support
Don’t hesitate to ask!

160
[Open an issue](https://github.com/stratosphere/stratosphere/issues/new) on Github, if you found a bug or need any help.
R
Robert Metzger 已提交
161
We also have a [mailing list](https://groups.google.com/d/forum/stratosphere-dev) for both users and developers.
162

R
Robert Metzger 已提交
163
Some of our colleagues are also in the #dima irc channel on freenode.
164 165 166

## Documentation

167
There is our (old) [official Wiki](https://stratosphere.eu/wiki/doku).
168
We are in the progress of migrating it to the [GitHub Wiki](https://github.com/stratosphere/stratosphere/wiki/_pages)
169

170
Please make edits to the Wiki if you find inconsistencies or [Open an issue](https://github.com/stratosphere/stratosphere/issues/new) 
171

172 173 174 175

## Fork and Contribute

This is an active open-source project. We are always open to people who want to use the system or contribute to it. 
R
Robert Metzger 已提交
176
Contact us if you are looking for implementation tasks that fit your skills.
177

R
Robert Metzger 已提交
178
We use the GitHub Pull Request system for the development of Stratosphere. Just open a request if you want to contribute.
179 180 181 182 183

### What to contribute
* Bug reports
* Bug fixes
* Documentation
K
ktzoumas 已提交
184 185
* Tools that ease the use and development of Stratosphere
* Well-written Stratosphere jobs
186 187


K
ktzoumas 已提交
188
Let us know if you have created a system that uses Stratosphere, so that we can link to you.
189

R
Robert Metzger 已提交
190
## About
191

192
[Stratosphere](http://www.stratosphere.eu) is a DFG-founded research project. Ozone is the codename of the latest Stratosphere distribution. 
R
Robert Metzger 已提交
193 194 195 196
We combine cutting edge research outcomes with a stable and usable codebase.
Decisions are not made behind closed doors. We discuss all changes and plans on our Mailinglists and on GitHub.


197
Build Status: [![Build Status](https://travis-ci.org/stratosphere/stratosphere.png)](https://travis-ci.org/stratosphere/stratosphere)
198 199 200 201 202