README.md 4.5 KB
Newer Older
1
# Apache Flink
2

3
Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities.
4

5
Learn more about Flink at [https://flink.apache.org/](https://flink.apache.org/)
6 7 8 9 10 11 12 13 14 15 16 17


### Features

* A streaming-first runtime that supports both batch processing and data streaming programs

* Elegant and fluent APIs in Java and Scala

* A runtime that supports very high throughput and low event latency at the same time

* Support for *event time* and *out-of-order* processing in the DataStream API, based on the *Dataflow Model*

J
John Eismeier 已提交
18
* Flexible windowing (time, count, sessions, custom triggers) across different time semantics (event time, processing time)
19 20 21

* Fault-tolerance with *exactly-once* processing guarantees

22
* Natural back-pressure in streaming programs
23 24 25

* Libraries for Graph processing (batch), Machine Learning (batch), and Complex Event Processing (streaming)

26
* Built-in support for iterative programs (BSP) in the DataSet (batch) API
27

28
* Custom memory management for efficient and robust switching between in-memory and out-of-core data processing algorithms
29

Z
zentol 已提交
30
* Compatibility layers for Apache Hadoop MapReduce
31

32
* Integration with YARN, HDFS, HBase, and other components of the Apache Hadoop ecosystem
33 34 35 36 37 38 39 40 41 42 43


### Streaming Example
```scala
case class WordWithCount(word: String, count: Long)

val text = env.socketTextStream(host, port, '\n')

val windowCounts = text.flatMap { w => w.split("\\s") }
  .map { w => WordWithCount(w, 1) }
  .keyBy("word")
44
  .window(TumblingProcessingTimeWindow.of(Time.seconds(5)))
45 46 47 48 49 50
  .sum("count")

windowCounts.print()
```

### Batch Example
51
```scala
52
case class WordWithCount(word: String, count: Long)
53

54
val text = env.readTextFile(path)
55

56 57
val counts = text.flatMap { w => w.split("\\s") }
  .map { w => WordWithCount(w, 1) }
58 59 60 61 62 63
  .groupBy("word")
  .sum("count")

counts.writeAsCsv(outputPath)
```

64 65


66
## Building Apache Flink from Source
67

68 69
Prerequisites for building Flink:

C
Chesnay Schepler 已提交
70
* Unix-like environment (we use Linux, Mac OS X, Cygwin, WSL)
71
* Git
72
* Maven (we recommend version 3.2.5 and require at least 3.1.1)
C
Chesnay Schepler 已提交
73
* Java 8 or 11 (Java 9 or 10 may work)
74

R
Robert Metzger 已提交
75
```
76 77
git clone https://github.com/apache/flink.git
cd flink
78
mvn clean package -DskipTests # this will take up to 10 minutes
R
Robert Metzger 已提交
79
```
80

C
Chesnay Schepler 已提交
81
Flink is now installed in `build-target`.
82

83
*NOTE: Maven 3.3.x can build Flink, but will not properly shade away certain dependencies. Maven 3.1.1 creates the libraries properly.
84
To build unit tests with Java 8, use Java 8u51 or above to prevent failures in unit tests that use the PowerMock runner.*
85

86 87
## Developing Flink

88
The Flink committers use IntelliJ IDEA to develop the Flink codebase.
89
We recommend IntelliJ IDEA for developing projects that involve Scala code.
90 91 92 93 94 95

Minimal requirements for an IDE are:
* Support for Java and Scala (also mixed projects)
* Support for Maven with Java and Scala


96
### IntelliJ IDEA
97 98 99 100

The IntelliJ IDE supports Maven out of the box and offers a plugin for Scala development.

* IntelliJ download: [https://www.jetbrains.com/idea/](https://www.jetbrains.com/idea/)
C
Chesnay Schepler 已提交
101
* IntelliJ Scala Plugin: [https://plugins.jetbrains.com/plugin/?id=1347](https://plugins.jetbrains.com/plugin/?id=1347)
102

103
Check out our [Setting up IntelliJ](https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/ide_setup.html#intellij-idea) guide for details.
104

105 106
### Eclipse Scala IDE

107 108 109
**NOTE:** From our experience, this setup does not work with Flink
due to deficiencies of the old Eclipse version bundled with Scala IDE 3.0.3 or
due to version incompatibilities with the bundled Scala version in Scala IDE 4.4.1.
110

111
**We recommend to use IntelliJ instead (see above)**
112 113

## Support
114

115 116
Don’t hesitate to ask!

117
Contact the developers and community on the [mailing lists](https://flink.apache.org/community.html#mailing-lists) if you need any help.
118

119
[Open an issue](https://issues.apache.org/jira/browse/FLINK) if you found a bug in Flink.
120 121


122
## Documentation
123

124
The documentation of Apache Flink is located on the website: [https://flink.apache.org](https://flink.apache.org)
125
or in the `docs/` directory of the source code.
126

127 128 129

## Fork and Contribute

J
John Eismeier 已提交
130
This is an active open-source project. We are always open to people who want to use the system or contribute to it.
R
Robert Metzger 已提交
131
Contact us if you are looking for implementation tasks that fit your skills.
132
This article describes [how to contribute to Apache Flink](https://flink.apache.org/contributing/how-to-contribute.html).
133 134


R
Robert Metzger 已提交
135
## About
136

137
Apache Flink is an open source project of The Apache Software Foundation (ASF).
138
The Apache Flink project originated from the [Stratosphere](http://stratosphere.eu) research project.