提交 8b480e55 编写于 作者: C cheddar

Add docs from github wiki

上级 73a87fe2
...@@ -12,3 +12,4 @@ target ...@@ -12,3 +12,4 @@ target
.settings/ .settings/
*.log *.log
*.DS_Store *.DS_Store
_site
Aggregations are specifications of processing over metrics available in Druid.
Available aggregations are:
### Sum aggregators
#### `longSum` aggregator
computes the sum of values as a 64-bit, signed integer
<code>{
"type" : "longSum",
"name" : <output_name>,
"fieldName" : <metric_name>
}</code>
`name` – output name for the summed value
`fieldName` – name of the metric column to sum over
#### `doubleSum` aggregator
Computes the sum of values as 64-bit floating point value. Similar to `longSum`
<code>{
"type" : "doubleSum",
"name" : <output_name>,
"fieldName" : <metric_name>
}</code>
### Count aggregator
`count` computes the row count that match the filters
<code>{
"type" : "count",
"name" : <output_name>,
}</code>
### Min / Max aggregators
#### `min` aggregator
`min` computes the minimum metric value
<code>{
"type" : "min",
"name" : <output_name>,
"fieldName" : <metric_name>
}</code>
#### `max` aggregator
`max` computes the maximum metric value
<code>{
"type" : "max",
"name" : <output_name>,
"fieldName" : <metric_name>
}</code>
### JavaScript aggregator
Computes an arbitrary JavaScript function over a set of columns (both metrics and dimensions).
All JavaScript functions must return numerical values.
<code>{
"type": "javascript",
"name": "<output_name>",
"fieldNames" : [ <column1>, <column2>, ... ],
"fnAggregate" : "function(current, column1, column2, ...) {
<updates partial aggregate (current) based on the current row values>
return <updated partial aggregate>
}"
"fnCombine" : "function(partialA, partialB) { return <combined partial results>; }"
"fnReset" : "function() { return <initial value>; }"
}</code>
**Example**
<code>{
"type": "javascript",
"name": "sum(log(x)/y) + 10",
"fieldNames": ["x", "y"],
"fnAggregate" : "function(current, a, b) { return current + (Math.log(a) * b); }"
"fnCombine" : "function(partialA, partialB) { return partialA + partialB; }"
"fnReset" : "function() { return 10; }"
}</code>
Batch Data Ingestion
====================
There are two choices for batch data ingestion to your Druid cluster, you can use the [[Indexing service]] or you can use the `HadoopDruidIndexerMain`. This page describes how to use the `HadoopDruidIndexerMain`.
Which should I use?
-------------------
The [[Indexing service]] is a node that can run as part of your Druid cluster and can accomplish a number of different types of indexing tasks. Even if all you care about is batch indexing, it provides for the encapsulation of things like the Database that is used for segment metadata and other things, so that your indexing tasks do not need to include such information. Long-term, the indexing service is going to be the preferred method of ingesting data.
The `HadoopDruidIndexerMain` runs hadoop jobs in order to separate and index data segments. It takes advantage of Hadoop as a job scheduling and distributed job execution platform. It is a simple method if you already have Hadoop running and don’t want to spend the time configuring and deploying the [[Indexing service]] just yet.
HadoopDruidIndexer
------------------
Located at `com.metamx.druid.indexer.HadoopDruidIndexerMain` can be run like
<code>
java -cp hadoop_config_path:druid_indexer_selfcontained_jar_path com.metamx.druid.indexer.HadoopDruidIndexerMain <config_file>
</code>
The interval is the [ISO8601 interval](http://en.wikipedia.org/wiki/ISO_8601#Time_intervals) of the data you are processing. The config\_file is a path to a file (the “specFile”) that contains JSON and an example looks like:
<code>
{
"dataSource": "the_data_source",
"timestampColumn": "ts",
"timestampFormat": "<iso, millis, posix, auto or any Joda time format>",
"dataSpec": {
"format": "<csv, tsv, or json>",
"columns": ["ts", "column_1", "column_2", "column_3", "column_4", "column_5"],
"dimensions": ["column_1", "column_2", "column_3"]
},
"granularitySpec": {
"type":"uniform",
"intervals":["<ISO8601 interval:http://en.wikipedia.org/wiki/ISO_8601#Time_intervals>"],
"gran":"day"
},
"pathSpec": { "type": "granularity",
"dataGranularity": "hour",
"inputPath": "s3n://billy-bucket/the/data/is/here",
"filePattern": ".*" },
"rollupSpec": { "aggs": [
{ "type": "count", "name":"event_count" },
{ "type": "doubleSum", "fieldName": "column_4", "name": "revenue" },
{ "type": "longSum", "fieldName" : "column_5", "name": "clicks" }
],
"rollupGranularity": "minute"},
"workingPath": "/tmp/path/on/hdfs",
"segmentOutputPath": "s3n://billy-bucket/the/segments/go/here",
"leaveIntermediate": "false",
"partitionsSpec": {
"targetPartitionSize": 5000000
},
"updaterJobSpec": {
"type":"db",
"connectURI":"jdbc:mysql://localhost:7980/test_db",
"user":"username",
"password":"passmeup",
"segmentTable":"segments"
}
}
</code>
### Hadoop indexer config
|property|description|required?|
|--------|-----------|---------|
|dataSource|name of the dataSource the data will belong to|yes|
|timestampColumn|the column that is to be used as the timestamp column|yes|
|timestampFormat|the format of timestamps; auto = either iso or millis, Joda time formats:http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html|yes|
|dataSpec|a specification of the data format and an array that names all of the columns in the input data|yes|
|dimensions|the columns that are to be used as dimensions|yes|
|granularitySpec|the time granularity and interval to chunk segments up into|yes|
|pathSpec|a specification of where to pull the data in from|yes|
|rollupSpec|a specification of the rollup to perform while processing the data|yes|
|workingPath|the working path to use for intermediate results (results between Hadoop jobs)|yes|
|segmentOutputPath|the path to dump segments into|yes|
|leaveIntermediate|leave behind files in the workingPath when job completes or fails (debugging tool)|no|
|partitionsSpec|a specification of how to partition each time bucket into segments, absence of this property means no partitioning will occur|no|
|updaterJobSpec|a specification of how to update the metadata for the druid cluster these segments belong to|yes|
|registererers|a list of serde handler classnames|no|
### Path specification
There are multiple types of path specification:
##### `granularity`
Is a type of data loader that expects data to be laid out in a specific path format. Specifically, it expects it to be segregated by day in this directory format `y=XXXX/m=XX/d=XX/H=XX/M=XX/S=XX` (dates are represented by lowercase, time is represented by uppercase).
|property|description|required?|
|--------|-----------|---------|
|dataGranularity|specifies the granularity to expect the data at, e.g. hour means to expect directories `y=XXXX/m=XX/d=XX/H=XX`|yes|
|inputPath|Base path to append the expected time path to|yes|
|filePattern|Pattern that files should match to be included|yes|
For example, if the sample config were run with the interval 2012-06-01/2012-06-02, it would expect data at the paths
s3n://billy-bucket/the/data/is/here/y=2012/m=06/d=01/H=00
s3n://billy-bucket/the/data/is/here/y=2012/m=06/d=01/H=01
...
s3n://billy-bucket/the/data/is/here/y=2012/m=06/d=01/H=23
### Rollup specification
The indexing process has the ability to roll data up as it processes the incoming data. If data has already been summarized, summarizing it again will produce the same results so either way is not a problem. This specifies how that rollup should take place.
|property|description|required?|
|--------|-----------|---------|
|aggs|specifies a list of aggregators to aggregate for each bucket (a bucket is defined by the tuple of the truncated timestamp and the dimensions). Aggregators available here are the same as available when querying.|yes|
|rollupGranularity|The granularity to use when truncating incoming timestamps for bucketization|yes|
### Partitioning specification
Segments are always partitioned based on timestamp (according to the granularitySpec) and may be further partitioned in some other way. For example, data for a day may be split by the dimension “last\_name” into two segments: one with all values from A-M and one with all values from N-Z.
To use this option, the indexer must be given a target partition size. It can then find a good set of partition ranges on its own.
|property|description|required?|
|--------|-----------|---------|
|targetPartitionSize|target number of rows to include in a partition, should be a number that targets segments of 700MB\~1GB.|yes|
|partitionDimension|the dimension to partition on. Leave blank to select a dimension automatically.|no|
|assumeGrouped|assume input data has already been grouped on time and dimensions. This is faster, but can choose suboptimal partitions if the assumption is violated.|no|
### Updater job spec
This is a specification of the properties that tell the job how to update metadata such that the Druid cluster will see the output segments and load them.
|property|description|required?|
|--------|-----------|---------|
|type|“db” is the only value available|yes|
|connectURI|a valid JDBC url to MySQL|yes|
|user|username for db|yes|
|password|password for db|yes|
|segmentTable|table to use in DB|yes|
These properties should parrot what you have configured for your [[Master]].
# Booting a Single Node Cluster #
[[Loading Your Data]] and [[Querying Your Data]] contain recipes to boot a small druid cluster on localhost. Here we will boot a small cluster on EC2. You can checkout the code, or download a tarball from [here](http://static.druid.io/artifacts/druid-services-0.5.51-SNAPSHOT-bin.tar.gz).
The [ec2 run script](https://github.com/metamx/druid/blob/master/examples/bin/run_ec2.sh), run_ec2.sh, is located at 'examples/bin' if you have checked out the code, or at the root of the project if you've downloaded a tarball. The scripts rely on the [Amazon EC2 API Tools](http://aws.amazon.com/developertools/351), and you will need to set three environment variables:
```bash
# Setup environment for ec2-api-tools
export EC2_HOME=/path/to/ec2-api-tools-1.6.7.4/
export PATH=$PATH:$EC2_HOME/bin
export AWS_ACCESS_KEY=
export AWS_SECRET_KEY=
```
Then, booting an ec2 instance running one node of each type is as simple as running the script, run_ec2.sh :)
# Apache Whirr #
Apache Whirr is a set of libraries for launching cloud services. You can clone a version of Whirr that includes Druid as a service from git@github.com:rjurney/whirr.git:
```bash
git clone git@github.com:rjurney/whirr.git
cd whirr
git checkout trunk
mvn clean install -Dmaven.test.failure.ignore=true -Dcheckstyle.skip
sp;bin/whirr launch-cluster --config recipes/druid.properties
```
\ No newline at end of file
Broker
======
The Broker is the node to route queries to if you want to run a distributed cluster. It understands the metadata published to ZooKeeper about what segments exist on what nodes and routes queries such that they hit the right nodes. This node also merges the result sets from all of the individual nodes together.
Forwarding Queries
------------------
Most druid queries contain an interval object that indicates a span of time for which data is requested. Likewise, Druid [[Segments]] are partitioned to contain data for some interval of time and segments are distributed across a cluster. Consider a simple datasource with 7 segments where each segment contains data for a given day of the week. Any query issued to the datasource for more than one day of data will hit more than one segment. These segments will likely be distributed across multiple nodes, and hence, the query will likely hit multiple nodes.
To determine which nodes to forward queries to, the Broker node first builds a view of the world from information in Zookeeper. Zookeeper maintains information about [[Compute]] and [[Realtime]] nodes and the segments they are serving. For every datasource in Zookeeper, the Broker node builds a timeline of segments and the nodes that serve them. When queries are received for a specific datasource and interval, the Broker node performs a lookup into the timeline associated with the query datasource for the query interval and retrieves the nodes that contain data for the query. The Broker node then forwards down the query to the selected nodes.
Caching
-------
Broker nodes employ a distributed cache with a LRU cache invalidation strategy. The broker cache stores per segment results. The cache can be local to each broker node or shared across multiple nodes using an external distributed cache such as [memcached](http://memcached.org/). Each time a broker node receives a query, it first maps the query to a set of segments. A subset of these segment results may already exist in the cache and the results can be directly pulled from the cache. For any segment results that do not exist in the cache, the broker node will forward the query to the
compute nodes. Once the compute nodes return their results, the broker will store those results in the cache. Real-time segments are never cached and hence requests for real-time data will always be forwarded to real-time nodes. Real-time data is perpetually changing and caching the results would be unreliable.
Running
-------
Broker nodes can be run using the `com.metamx.druid.http.BrokerMain` class.
Configuration
-------------
See [[Configuration]].
### Clone and Build from Source
The other way to setup Druid is from source via git. To do so, run these commands:
```
git clone git@github.com:metamx/druid.git
cd druid
./build.sh
```
You should see a bunch of files:
```
DruidCorporateCLA.pdf README common examples indexer pom.xml server
DruidIndividualCLA.pdf build.sh doc group_by.body install publications services
LICENSE client eclipse_formatting.xml index-common merger realtime
```
You can find the example executables in the examples/bin directory:
* run_example_server.sh
* run_example_client.sh
A Druid cluster consists of various node types that need to be set up depending on your use case. See our [[Design]] docs for a description of the different node types.
Setup Scripts
-------------
One of our community members, [housejester](https://github.com/housejester/), contributed some scripts to help with setting up a cluster. Checkout the [github](https://github.com/housejester/druid-test-harness) and [wiki](https://github.com/housejester/druid-test-harness/wiki/Druid-Test-Harness).
Minimum Physical Layout: Absolute Minimum
-----------------------------------------
As a special case, the absolute minimum setup is one of the standalone examples for realtime ingestion and querying; see [[Examples]] that can easily run on one machine with one core and 1GB RAM. This layout can be set up to try some basic queries with Druid.
Minimum Physical Layout: Experimental Testing with 4GB of RAM
-------------------------------------------------------------
This layout can be used to load some data from deep storage onto a Druid compute node for the first time. A minimal physical layout for a 1 or 2 core machine with 4GB of RAM is:
1. node1: [[Master]] + metadata service + zookeeper + [[Compute]]
2. transient nodes: indexer
This setup is only reasonable to prove that a configuration works. It would not be worthwhile to use this layout for performance measurement.
Comfortable Physical Layout: Pilot Project with Multiple Machines
-----------------------------------------------------------------
*The machine size “flavors” are using AWS/EC2 terminology for descriptive purposes only and is not meant to imply that AWS/EC2 is required or recommended. Another cloud provider or your own hardware can also work.*
A minimal physical layout not constrained by cores that demonstrates parallel querying and realtime, using AWS-EC2 “small”/m1.small (one core, with 1.7GB of RAM) or larger, no realtime, is:
1. node1: [[Master]] (m1.small)
2. node2: metadata service (m1.small)
3. node3: zookeeper (m1.small)
4. node4: [[Broker]] (m1.small or m1.medium or m1.large)
5. node5: [[Compute]] (m1.small or m1.medium or m1.large)
6. node6: [[Compute]] (m1.small or m1.medium or m1.large)
7. node7: [[Realtime]] (m1.small or m1.medium or m1.large)
8. transient nodes: indexer
This layout naturally lends itself to adding more RAM and core to Compute nodes, and to adding many more Compute nodes. Depending on the actual load, the Master, metadata server, and Zookeeper might need to use larger machines.
High Availability Physical Layout
---------------------------------
*The machine size “flavors” are using AWS/EC2 terminology for descriptive purposes only and is not meant to imply that AWS/EC2 is required or recommended. Another cloud provider or your own hardware can also work.*
An HA layout allows full rolling restarts and heavy volume:
1. node1: [[Master]] (m1.small or m1.medium or m1.large)
2. node2: [[Master]] (m1.small or m1.medium or m1.large) (backup)
3. node3: metadata service (c1.medium or m1.large)
4. node4: metadata service (c1.medium or m1.large) (backup)
5. node5: zookeeper (c1.medium)
6. node6: zookeeper (c1.medium)
7. node7: zookeeper (c1.medium)
8. node8: [[Broker]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
9. node9: [[Broker]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge) (backup)
10. node10: [[Compute]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
11. node11: [[Compute]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
12. node12: [[Realtime]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
13. transient nodes: indexer
Sizing for Cores and RAM
------------------------
The Compute and Broker nodes will use as many cores as are available, depending on usage, so it is best to keep these on dedicated machines. The upper limit of effectively utilized cores is not well characterized yet and would depend on types of queries, query load, and the schema. Compute daemons should have a heap a size of at least 1GB per core for normal usage, but could be squeezed into a smaller heap for testing. Since in-memory caching is essential for good performance, even more RAM is better. Broker nodes will use RAM for caching, so they do more than just route queries.
The effective utilization of cores by Zookeeper, MySQL, and Master nodes is likely to be between 1 and 2 for each process/daemon, so these could potentially share a machine with lots of cores. These daemons work with heap a size between 500MB and 1GB.
Storage
-------
Indexed segments should be kept in a permanent store accessible by all nodes like AWS S3 or HDFS or equivalent. Currently Druid supports S3, but this will be extended soon.
Local disk (“ephemeral” on AWS EC2) for caching is recommended over network mounted storage (example of mounted: AWS EBS, Elastic Block Store) in order to avoid network delays during times of heavy usage. If your data center is suitably provisioned for networked storage, perhaps with separate LAN/NICs just for storage, then mounted might work fine.
Setup
-----
Setting up a cluster is essentially just firing up all of the nodes you want with the proper [[configuration]]. One thing to be aware of is that there are a few properties in the configuration that potentially need to be set individually for each process:
<code>
druid.server.type=historical|realtime
druid.host=someHostOrIPaddrWithPort
druid.port=8080
</code>
`druid.server.type` should be set to “historical” for your compute nodes and realtime for the realtime nodes. The master will only assign segments to a “historical” node and the broker has some intelligence around its ability to cache results when talking to a realtime node. This does not need to be set for the master or the broker.
`druid.host` should be set to the hostname and port that can be used to talk to the given server process. Basically, someone should be able to send a request to http://\${druid.host}/ and actually talk to the process.
`druid.port` should be set to the port that the server should listen on. In the vast majority of cases, this port should be the same as what is on `druid.host`.
Build/Run
---------
The simplest way to build and run from the repository is to run `mvn package` from the base directory and then take `druid-services/target/druid-services-*-selfcontained.jar` and push that around to your machines; the jar does not need to be expanded, and since it contains the main() methods for each kind of service, it is **not** invoked with java ~~jar. It can be run from a normal java command-line by just including it on the classpath and then giving it the main class that you want to run. For example one instance of the Compute node/service can be started like this:
\<pre\>
<code>
java~~Duser.timezone=UTC ~~Dfile.encoding=UTF-8~~cp compute/:druid-services/target/druid-services~~\*~~selfcontained.jar com.metamx.druid.http.ComputeMain
</code>
</pre>
The following table shows the possible services and fully qualified class for main().
|service|main class|
|-------|----------|
|[[ Realtime ]]|com.metamx.druid.realtime.RealtimeMain|
|[[ Master ]]|com.metamx.druid.http.MasterMain|
|[[ Broker ]]|com.metamx.druid.http.BrokerMain|
|[[ Compute ]]|com.metamx.druid.http.ComputeMain|
Compute
=======
Compute nodes are the work horses of a cluster. They load up historical segments and expose them for querying.
Loading and Serving Segments
----------------------------
Each compute node maintains a constant connection to Zookeeper and watches a configurable set of Zookeeper paths for new segment information. Compute nodes do not communicate directly with each other or with the master nodes but instead rely on Zookeeper for coordination.
The [[Master]] node is responsible for assigning new segments to compute nodes. Assignment is done by creating an ephemeral Zookeeper entry under a load queue path associated with a compute node. For more information on how the master assigns segments to compute nodes, please see [[Master]].
When a compute node notices a new load queue entry in its load queue path, it will first check a local disk directory (cache) for the information about segment. If no information about the segment exists in the cache, the compute node will download metadata about the new segment to serve from Zookeeper. This metadata includes specifications about where the segment is located in deep storage and about how to decompress and process the segment. For more information about segment metadata and Druid segments in general, please see [[Segments]]. Once a compute node completes processing a segment, the segment is announced in Zookeeper under a served segments path associated with the node. At this point, the segment is available for querying.
Loading and Serving Segments From Cache
---------------------------------------
Recall that when a compute node notices a new segment entry in its load queue path, the compute node first checks a configurable cache directory on its local disk to see if the segment had been previously downloaded. If a local cache entry already exists, the compute node will directly read the segment binary files from disk and load the segment.
The segment cache is also leveraged when a compute node is first started. On startup, a compute node will search through its cache directory and immediately load and serve all segments that are found. This feature allows compute nodes to be queried as soon they come online.
Querying Segments
-----------------
Please see [[Querying]] for more information on querying compute nodes.
For every query that a compute node services, it will log the query and report metrics on the time taken to run the query.
Running
-------
Compute nodes can be run using the `com.metamx.druid.http.ComputeMain` class.
Configuration
-------------
See [[Configuration]].
Concepts and Terminology
========================
- **Aggregators:** A mechanism for combining records during realtime incremental indexing, Hadoop batch indexing, and in queries.
- **DataSource:** A table-like view of data; specified in a “specFile” and in a query.
- **Granularity:** The time interval corresponding to aggregation by time.
- The *indexGranularity* setting in a schema is used to aggregate input (ingest) records within an interval into a single output (internal) record.
- The *segmentGranularity* is the interval specifying how internal records are stored together in a single file.
- **Segment:** A collection of (internal) records that are stored and processed together.
- **Shard:** A unit of partitioning data across machine. TODO: clarify; by time or other dimensions?
- **specFile** is specification for services in JSON format; see [[Realtime]] and [[Batch-ingestion]]
This describes the basic server configuration that is loaded by all the server processes; the same file is loaded by all. See also the json “specFile” descriptions in [[Realtime]] and [[Batch-ingestion]].
JVM Configuration Best Practices
================================
There are three JVM parameters that we set on all of our processes:
1. `-Duser.timezone=UTC` This sets the default timezone of the JVM to UTC. We always set this and do not test with other default timezones, so local timezones might work, but they also might uncover weird and interesting bugs
2. `-Dfile.encoding=UTF-8` This is similar to timezone, we test assuming UTF-8. Local encodings might work, but they also might result in weird and interesting bugs
3. `-Djava.io.tmpdir=<a path>` Various parts of the system that interact with the file system do it via temporary files, these files can get somewhat large. Many production systems are setup to have small (but fast) `/tmp` directories, these can be problematic with Druid so we recommend pointing the JVM’s tmp directory to something with a little more meat.
Basic Service Configuration
===========================
Configuration of the various nodes is done via Java properties. These can either be provided as `-D` system properties on the java command line or they can be passed in via a file called `runtime.properties` that exists on the classpath. Note: as a future item, I’d like to consolidate all of the various configuration into a yaml/JSON based configuration files.
The periodic time intervals (like “PT1M”) are [ISO8601 intervals](http://en.wikipedia.org/wiki/ISO_8601#Time_intervals)
An example runtime.properties is as follows:
<code>
# S3 access
com.metamx.aws.accessKey=<S3 access key>
com.metamx.aws.secretKey=<S3 secret_key>
# thread pool size for servicing queries
druid.client.http.connections=30
# JDBC connection string for metadata database
druid.database.connectURI=
druid.database.user=user
druid.database.password=password
# time between polling for metadata database
druid.database.poll.duration=PT1M
druid.database.segmentTable=prod_segments
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# Path on local FS for storage of segment metadata; dir will be created if needed
druid.paths.segmentInfoCache=/tmp/druid/segmentInfoCache
druid.request.logging.dir=/tmp/druid/log
druid.server.maxSize=300000000000
# ZK quorum IPs
druid.zk.service.host=
# ZK path prefix for Druid-usage of zookeeper, Druid will create multiple paths underneath this znode
druid.zk.paths.base=/druid
# ZK path for discovery, the only path not to default to anything
druid.zk.paths.discoveryPath=/druid/discoveryPath
# the host:port as advertised to clients
druid.host=someHostOrIPaddrWithPort
# the port on which to listen, this port should line up with the druid.host value
druid.port=8080
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
druid.processing.formatString=processing_%s
druid.processing.numThreads=3
druid.computation.buffer.size=100000000
# S3 dest for realtime indexer
druid.pusher.s3.bucket=
druid.pusher.s3.baseKey=
druid.bard.cache.sizeInBytes=40000000
druid.master.merger.service=blah_blah
</code>
Configuration groupings
-----------------------
### S3 Access
These properties are for connecting with S3 and using it to pull down segments. In the future, we plan on being able to use other deep storage file systems as well, like HDFS. The file system is actually only accessed by the [[Compute]], [[Realtime]] and [[Indexing service]] nodes.
|Property|Description|Default|
|--------|-----------|-------|
|`com.metamx.aws.accessKey`|The access key to use to access S3.|none|
|`com.metamx.aws.secretKey`|The secret key to use to access S3.|none|
|`druid.pusher.s3.bucket`|The bucket to store segments, this is used by Realtime and the Indexing service.|none|
|`druid.pusher.s3.baseKey`|The base key to use when storing segments, this is used by Realtime and the Indexing service|none|
### JDBC connection
These properties specify the jdbc connection and other configuration around the “segments table” database. The only processes that connect to the DB with these properties are the [[Master]] and [[Indexing service]]. This is tested on MySQL.
|Property|Description|Default|
|--------|-----------|-------|
|`druid.database.connectURI`|The jdbc connection uri|none|
|`druid.database.user`|The username to connect with|none|
|`druid.database.password`|The password to connect with|none|
|`druid.database.poll.duration`|The duration between polls the Master does for updates to the set of active segments. Generally defines the amount of lag time it can take for the master to notice new segments|PT1M|
|`druid.database.segmentTable`|The table to use to look for segments.|none|
|`druid.database.ruleTable`|The table to use to look for segment load/drop rules.|none|
|`druid.database.configTable`|The table to use to look for configs.|none|
### Master properties
|Property|Description|Default|
|--------|-----------|-------|
|`druid.master.period`|The run period for the master. The master’s operates by maintaining the current state of the world in memory and periodically looking at the set of segments available and segments being served to make decisions about whether any changes need to be made to the data topology. This property sets the delay between each of these runs|PT60S|
|`druid.master.removedSegmentLifetime`|When a node disappears, the master can provide a grace period for how long it waits before deciding that the node really isn’t going to come back and it really should declare that all segments from that node are no longer available. This sets that grace period in number of runs of the master.|1|
|`druid.master.startDelay`|The operation of the Master works on the assumption that it has an up-to-date view of the state of the world when it runs, the current ZK interaction code, however, is written in a way that doesn’t allow the Master to know for a fact that it’s done loading the current state of the world. This delay is a hack to give it enough time to believe that it has all the data|PT600S|
### Zk properties
See [[ZooKeeper]] for a description of these properties.
### Service properties
These are properties that define various service/HTTP server aspects
|Property|Description|Default|
|--------|-----------|-------|
|`druid.client.http.connections`|Size of connection pool for the Broker to connect to compute nodes. If there are more queries than this number that all need to speak to the same node, then they will queue up.|none|
|`druid.paths.indexCache`|Segments assigned to a compute node are first stored on the local file system and then served by the compute node. This path defines where that local cache resides. Directory will be created if needed|none|
|`druid.paths.segmentInfoCache`|Compute nodes keep track of the segments they are serving so that when the process is restarted they can reload the same segments without waiting for the master to reassign. This path defines where this metadata is kept. Directory will be created if needed|none|
|`druid.http.numThreads`|The number of HTTP worker threads.|10|
|`druid.http.maxIdleTimeMillis`|The amount of time a connection can remain idle before it is terminated|300000 (5 min)|
|`druid.request.logging.dir`|Compute, Realtime and Broker nodes maintain request logs of all of the requests they get (interacton is via POST, so normal request logs don’t generally capture information about the actual query), this specifies the directory to store the request logs in|none|
|`druid.host`|The host for the current node. This is used to advertise the current processes location as reachable from another node and should generally be specified such that `http://${druid.host}/` could actually talk to this process|none|
|`druid.port`|This is the port to actually listen on; unless port mapping is used, this will be the same port as is on `druid.host`|none|
|`druid.processing.formatString`|Realtime and Compute nodes use this format string to name their processing threads.|none|
|`druid.processing.numThreads`|The number of processing threads to have available for parallel processing of segments. Our rule of thumb is `num_cores - 1`, this means that even under heavy load there will still be one core available to do background tasks like talking with ZK and pulling down segments.|none|
|`druid.computation.buffer.size`|This specifies a buffer size for the storage of intermediate results. The computation engine in both the Compute and Realtime nodes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed.|1073741824 (1GB)|
|`druid.service`|The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services|none|
|`druid.bard.cache.sizeInBytes`|The Broker (called Bard internally) instance has the ability to store results of queries in an in-memory cache. This specifies the number of bytes to use for that cache|none|
### Compute Properties
These are properties that the compute nodes use
|Property|Description|Default|
|--------|-----------|-------|
|`druid.server.maxSize`|The maximum number of bytes worth of segment that the node wants assigned to it. This is not a limit that the compute nodes actually enforce, they just publish it to the master and trust the master to do the right thing|none|
|`druid.server.type`|Specifies the type of the node. This is published via ZK and depending on the value the node will be treated specially by the Master/Broker. Allowed values are “realtime” or “historical”. This is a configuration parameter because the plan is to allow for a more configurable cluster composition. At the current time, all realtime nodes should just be “realtime” and all compute nodes should just be “compute”|none|
### Emitter Properties
The Druid servers emit various metrics and alerts via something we call an [[Emitter]]. There are two emitter implementations included with the code, one that just logs to log4j and one that does POSTs of JSON events to a server. More information can be found on the [[Emitter]] page. The properties for using the logging emitter are described below.
|Property|Description|Default|
|--------|-----------|-------|
|`com.metamx.emitter.logging`|Set to “true” to use the logging emitter|none|
|`com.metamx.emitter.logging.level`|Sets the level to log at|debug|
|`com.metamx.emitter.logging.class`|Sets the class to log at|com.metamx.emiter.core.LoggingEmitter|
### Realtime Properties
|Property|Description|Default|
|--------|-----------|-------|
|`druid.realtime.specFile`|The file with realtime specifications in it. See [[Realtime]].|none|
If you are interested in contributing to the code, we accept [pull requests](https://help.github.com/articles/using-pull-requests). Note: we have only just completed decoupling our Metamarkets-specific code from the code base and we took some short-cuts in interface design to make it happen. So, there are a number of interfaces that exist right now which are likely to be in flux. If you are embedding Druid in your system, it will be safest for the time being to only extend/implement interfaces that this wiki describes, as those are intended as stable (unless otherwise mentioned).
For issue tracking, we are using the github issue tracker. Please fill out an issue from the Issues tab on the github screen.
We also have a [[Libraries]] page that lists external libraries that people have created for working with Druid.
Deep storage is where segments are stored. It is a storage mechanism that Druid does not provide. This deep storage infrastructure defines the level of durability of your data, as long as Druid nodes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose. If segments disappear from this storage layer, then you will lose whatever data those segments represented.
The currently supported types of deep storage follow.
## S3-compatible
S3-compatible deep storage is basically either S3 or something like riak-cs which exposes the same API as S3. This is the default deep storage implementation.
S3 configuration parameters are
com.metamx.aws.accessKey=<S3 access key>
com.metamx.aws.secretKey=<S3 secret_key>
druid.pusher.s3.bucket=<bucket to store in>
druid.pusher.s3.baseKey=<base key prefix to use, i.e. what directory>
## HDFS
As of 0.4.0, HDFS can be used for storage of segments as well.
In order to use hdfs for deep storage, you need to set the following configuration on your realtime nodes.
druid.pusher.hdfs=true
druid.pusher.hdfs.storageDirectory=<directory for storing segments>
If you are using the Hadoop indexer, set your output directory to be a location on Hadoop and it will work
## Local Mount
A local mount can be used for storage of segments as well. This allows you to use just your local file system or anything else that can be mount locally like NFS, Ceph, etc.
In order to use a local mount for deep storage, you need to set the following configuration on your realtime nodes.
druid.pusher.local=true
druid.pusher.local.storageDirectory=<directory for storing segments>
Note that you should generally set `druid.pusher.local.storageDirectory` to something different from `druid.paths.indexCache`.
If you are using the Hadoop indexer in local mode, then just give it a local file as your output directory and it will work.
\ No newline at end of file
For a comprehensive look at the architecture of Druid, read the [White Paper](http://static.druid.io/docs/druid.pdf).
What is Druid?
==============
Druid is a system built to allow fast (“real-time”) access to large sets of seldom-changing data. It was designed with the intent of being a service and maintaining 100% uptime in the face of code deployments, machine failures and other eventualities of a production system. It can be useful for back-office use cases as well, but design decisions were made explicitly targetting an always-up service.
Druid currently allows for single-table queries in a similar manner to [Dremel](http://research.google.com/pubs/pub36632.html) and [PowerDrill](http://www.vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf). It adds to the mix
1. columnar storage format for partially nested data structures
2. hierarchical query distribution with intermediate pruning
3. indexing for quick filtering
4. realtime ingestion (ingested data is immediately available for querying)
5. fault-tolerant distributed architecture that doesn’t lose data
As far as a comparison of systems is concerned, Druid sits in between PowerDrill and Dremel on the spectrum of functionality. It implements almost everything Dremel offers (Dremel handles arbitrary nested data structures while Druid only allows for a single level of array-based nesting) and gets into some of the interesting data layout and compression methods from PowerDrill.
Druid is a good fit for products that require real-time data ingestion of a single, large data stream. Especially if you are targetting no-downtime operation and are building your product on top of a time-oriented summarization of the incoming data stream. Druid is probably not the right solution if you care more about query flexibility and raw data access than query speed and no-downtime operation. When talking about query speed it is important to clarify what “fast” means, with Druid it is entirely within the realm of possibility (we have done it) to achieve queries that run in single-digit seconds across a 6TB data set.
### Architecture
Druid is architected as a grouping of systems each with a distinct role and together they form a working system. The name comes from the Druid class in many role-playing games: it is a shape-shifter, capable of taking many different forms to fulfill various different roles in a group.
The node types that currently exist are:
\* **Compute** nodes are the workhorses that handle storage and querying on “historical” data (non-realtime)
\* **Realtime** nodes ingest data in real-time, they are in charge of listening to a stream of incoming data and making it available immediately inside the Druid system. As data they have ingested ages, they hand it off to the compute nodes.
\* **Master** nodes act as coordinators. They look over the grouping of computes and make sure that data is available, replicated and in a generally “optimal” configuration.
\* **Broker** nodes understand the topology of data across all of the other nodes in the cluster and re-write and route queries accordingly
\* **Indexer** nodes form a cluster of workers to load batch and real-time data into the system as well as allow for alterations to the data stored in the system (also known as the Indexing Service)
This separation allows each node to only care about what it is best at. By separating Compute and Realtime, we separate the memory concerns of listening on a real-time stream of data and processing it for entry into the system. By separating the Master and Broker, we separate the needs for querying from the needs for maintaining “good” data distribution across the cluster.
All nodes can be run in some highly available fashion. Either as symmetric peers in a share-nothing cluster or as hot-swap failover nodes.
Aside from these nodes, there are 3 external dependencies to the system:
1. A running [ZooKeeper](http://zookeeper.apache.org/) cluster for cluster service discovery and maintenance of current data topology
2. A MySQL instance for maintenance of metadata about the data segments that should be served by the system
3. A “deep storage” LOB store/file system to hold the stored segments
### Data Storage
Getting data into the Druid system requires an indexing process. This gives the system a chance to analyze the data, add indexing structures, compress and adjust the layout in an attempt to optimize query speed. A quick list of what happens to the data follows.
- Converted to columnar format
- Indexed with bitmap indexes
- Compressed using various algorithms
- LZF (switching to Snappy is on the roadmap, not yet implemented)
- Dictionary encoding w/ id storage minimization
- Bitmap compression
- RLE (on the roadmap, but not yet implemented)
The output of the indexing process is stored in a “deep storage” LOB store/file system ([[Deep Storage]] for information about potential options). Data is then loaded by compute nodes by first downloading the data to their local disk and then memory mapping it before serving queries.
If a compute node dies, it will no longer serve its segments, but given that the segments are still available on the “deep storage” any other node can simply download the segment and start serving it. This means that it is possible to actually remove all compute nodes from the cluster and then re-provision them without any data loss. It also means that if the “deep storage” is not available, the nodes can continue to serve the segments they have already pulled down (i.e. the cluster goes stale, not down).
In order for a segment to exist inside of the cluster, an entry has to be added to a table in a MySQL instance. This entry is a self-describing bit of metadata about the segment, it includes things like the schema of the segment, the size, and the location on deep storage. These entries are what the Master uses to know what data **should** be available on the cluster.
### Fault Tolerance
- **Compute** As discussed above, if a compute node dies, another compute node can take its place and there is no fear of data loss
- **Master** Can be run in a hot fail-over configuration. If no masters are running, then changes to the data topology will stop happening (no new data and no data balancing decisions), but the system will continue to run.
- **Broker** Can be run in parallel or in hot fail-over.
- **Realtime** Depending on the semantics of the delivery stream, multiple of these can be run in parallel processing the exact same stream. They periodically checkpoint to disk and eventually push out to the Computes. Steps are taken to be able to recover from process death, but loss of access to the local disk can result in data loss if this is the only method of adding data to the system.
- **“deep storage” file system** If this is not available, new data will not be able to enter the cluster, but the cluster will continue operating as is.
- **MySQL** If this is not available, the master will be unable to find out about new segments in the system, but it will continue with its current view of the segments that should exist in the cluster.
- **ZooKeeper** If this is not available, data topology changes will not be able to be made, but the Brokers will maintain their most recent view of the data topology and continue serving requests accordingly.
### Query processing
A query first enters the Broker, where the broker will match the query with the data segments that are known to exist. It will then pick a set of machines that are serving those segments and rewrite the query for each server to specify the segment(s) targetted. The Compute/Realtime nodes will take in the query, process them and return results. The Broker then takes the results and merges them together to get the final answer, which it returns. In this way, the broker can prune all of the data that doesn’t match a query before ever even looking at a single row of data.
For filters at a more granular level than what the Broker can prune based on, the indexing structures inside each segment allows the compute nodes to figure out which (if any) rows match the filter set before looking at any row of data. It can do all of the boolean algebra of the filter on the bitmap indices and never actually look directly at a row of data.
Once it knows the rows that match the current query, it can access the columns it cares about for those rows directly without having to load data that it is just going to throw away.
The following diagram shows the data flow for queries without showing batch indexing:
![Simple Data Flow](https://raw.github.com/metamx/druid/master/doc/data_flow_simple.png "Simple Data Flow")
### In-memory?
Druid is not always and only in-memory. When we first built it, it is true that it was all in-memory all the time, but as time went on the price-performance tradeoff ended up swinging towards keeping all of our customers data in memory all the time a non-starter. We then added the ability to memory map data and allow the OS to handle paging data in and out of memory on demand. Our production cluster is primarily configured to operate with this memory mapping behavior and we are definitely over-subscribed in terms of memory available vs. data a node is serving.
As you read some of the old blog posts or other literature about the project, you will see “in-memory” often touted as that is the history of where Druid came from, but the technical reality is that there is a spectrum of price vs. performance and being able to slide along it from all in-memory (high cost, great performance) to mostly on disk (low cost, low performance) is the important knob to be able to adjust.
A version may be declared as a release candidate if it has been deployed to a sizable production cluster. Release candidates are declared as stable after we feel fairly confident there are no major bugs in the version. Check out the [[Versioning]] section for how we describe software versions.
Release Candidate
-----------------
There is no release candidate at this time.
Stable Release
--------------
The current stable is tagged at version [0.5.49](https://github.com/metamx/druid/tree/druid-0.5.49).
# Druid Personal Demo Cluster (DPDC)
Note, there are currently some issues with the CloudFormation. We are working through them and will update the documentation here when things work properly. In the meantime, the simplest way to get your feet wet with a cluster setup is to run through the instructions at [housejester/druid-test-harness](https://github.com/housejester/druid-test-harness), though it is based on an older version. If you just want to get a feel for the types of data and queries that you can issue, check out [[Realtime Examples]]
## Introduction
To make it easy for you to get started with Druid, we created an AWS (Amazon Web Services) [CloudFormation](http://aws.amazon.com/cloudformation/) Template that allows you to create a small pre-configured Druid cluster using your own AWS account. The cluster contains a pre-loaded sample workload, the Wikipedia edit stream, and a basic query interface that gets you familiar with Druid capabilities like drill-downs and filters.
This guide walks you through the steps to create the cluster and then how to create basic queries. (The cluster setup should take you about 15-20 minutes depending on AWS response times).
## What’s in this Druid Demo Cluster?
1. A single "Master" node. This node co-locates the [[Master]] process, the [[Broker]] process, Zookeeper, and the MySQL instance. You can read more about Druid architecture [[Design]].
1. Three compute nodes; these compute nodes, have been pre-configured to work with the Master node and should automatically load up the Wikipedia edit stream data (no specific setup is required).
## Setup Instructions
1. Log in to your AWS account: Start by logging into the [Console page](https://console.aws.amazon.com) of your AWS account; if you don’t have one, follow this link to sign up for one [http://aws.amazon.com/](http://aws.amazon.com/).
![AWS Console Page](images/demo/setup-01-console.png)
1. If you have a [Key Pair](http://docs.aws.amazon.com/gettingstarted/latest/wah/getting-started-create-key-pair.html) already created you may skip this step. Note: this is required to create the demo cluster and is generally not used unless instances need to be accessed directly (e.g. via SSH).
1. Click **EC2** to go to the EC2 Dashboard. From there, click **Key Pairs** under Network & Security.
![EC2 Dashboard](images/demo/setup-02a-keypair.png)
1. Click on the button **Create Key Pair**. A dialog box will appear prompting you to enter a Key Pair name (as long as you remember it, the name is arbitrary, for this example we entered `Druid`). Click **Create**. You will be prompted to download a .pam; store this file in a safe place.
![Create Key Pair](images/demo/setup-02b-keypair.png)
1. Unless you’re there already, go back to the Console page, or follow this link: https://console.aws.amazon.com. Click **CloudFormation** under Deployment & Management.
![CloudFormation](images/demo/setup-03-ec2.png)
1. Click **Create New Stack**, which will bring up the **Create Stack** dialog.
![Create New Stack](images/demo/setup-04-newstack.png)
1. Enter a **Stack Name** (it’s arbitrary, we chose, `DruidStack`). Click **Provide a Template URL** type in the following template URL: _**https://s3.amazonaws.com/cf-templates-jm2ikmzj3y6x-us-east-1/2013081cA9-Druid04012013.template**_. Press **Continue**, this will take you to the Create Stack dialog.
![Stack Name & URL](images/demo/setup-05-createstack.png)
1. Enter `Druid` (or the Key Pair name you created in Step 2) in the **KeyPairName** field; click **Continue**. This should bring up another dialog prompting you to enter a **Key** and **Value**.
![Stack Parameters](images/demo/setup-06-parameters.png)
1. While the inputs are arbitrary, it’s important to remember this information; we chose to enter `version` for **Key** and `1` for **Value**. Press **Continue** to bring up a confirmation dialog.
![Add Tags](images/demo/setup-07a-tags.png)
1. Click **Continue** to start creating your Druid Demo environment (this will bring up another dialog box indicating your environment is being created; click **Close** to take you to a more detailed view of the Stack creation process). Note: depending on AWS, this step could take over 15 minutes – initialization continues even after the instances are created. (So yes, now would be a good time to grab that cup of coffee).
![Review](images/demo/setup-07b-review.png)
![Create Stack Complete](images/demo/setup-07c-complete.png)
1. Click and expand the **Events** tab in the CloudFormation Stacks window to get a more detailed view of the Druid Demo Cluster setup.
![CloudFormations](images/demo/setup-09-events.png)
1. Get the IP address of your Druid Master Node:
1. Go to the following URL: [https://console.aws.amazon.com/ec2](https://console.aws.amazon.com/ec2)
1. Click **Instances** in the left pane – you should see something similar to the following figure.
1. Select the **DruidMaster** instance
1. Your IP address is right under the heading: **EC2 Instance: DruidMaster**. Select and copy that entire line, which ends with `amazonaws.com`.
![EC2 Instances](images/demo/setup-10-ip.png)
## Querying Data
1. Use the following URL to bring up the Druid Demo Cluster query interface (replace **IPAddressDruidMaster** with the actual druid master IP Address):
**`http://IPAddressDruidMaster:8082/druid/v3/demoServlet`**
As you can see from the image below, there are default values in the Dimensions and Granularity fields. Clicking **Execute** will produce a basic query result.
![Demo Query Interface](images/demo/query-1.png)
1. Note: when the Query is in running the **Execute** button will be disabled and read: **Fetching…**
![Demo Query](images/demo/query-2.png)
1. You can add multiple Aggregation values, adjust Granularity, and Dimensions; query results will appear at the bottom of the window.
Enjoy! And for sure, please send along your comments and feedback or, aspirations on expanding and developing this demo. https://groups.google.com/d/forum/druid-development. Attention R users: we just open-sourced our R Druid connector: https://github.com/metamx/RDruid.
We are not experts on Cassandra, if anything is incorrect about our portrayal, please let us know on the mailing list or via some other means. We will fix this page.
Druid is highly optimized for scans and aggregations, it supports arbitrarily deep drill downs into data sets without the need to pre-compute, and it can ingest event streams in real-time and allow users to query events as they come in. Cassandra is a great key-value store and it has some features that allow you to use it to do more interesting things than what you can do with a pure key-value store. But, it is not built for the same use cases that Druid handles, namely regularly scanning over billions of entries per query.
Furthermore, Druid is fully read-consistent. Druid breaks down a data set into immutable chunks known as segments. All replicants always present the exact same view for the piece of data they are holding and we don’t have to worry about data synchronization. The tradeoff is that Druid has limited semantics for write and update operations. Cassandra, similar to Amazon’s Dynamo, has an eventually consistent data model. Writes are always supported but updates to data may take some time before all replicas sync up (data reconciliation is done at read time). This model favors availability and scalability over consistency.
\ No newline at end of file
Druid is a complementary addition to Hadoop. Hadoop is great at storing and making accessible large amounts of individually low-value data. Unfortunately, Hadoop is not great at providing query speed guarantees on top of that data, nor does it have very good operational characteristics for a customer-facing production system. Druid, on the other hand, excels at taking high-value summaries of the low-value data on Hadoop, making it available in a fast and always-on fashion, such that it could be exposed directly to a customer.
Druid also requires some infrastructure to exist for “deep storage”. HDFS is one of the implemented options for this “deep storage”.
The question of Druid versus Impala or Shark basically comes down to your product requirements and what the systems were designed to do.
Druid was designed to
1. be an always on service
1. ingest data in real-time
1. handle slice-n-dice style ad-hoc queries
Impala and Shark's primary design concerns (as far as I am aware) were to replace Hadoop MapReduce with another, faster, query layer that is completely generic and plays well with the other ecosystem of Hadoop technologies. I will caveat this discussion with the statement that I am not an expert on Impala or Shark, nor am I intimately familiar with their roadmaps. If anything is incorrect on this page, I'd be happy to change it, please send a note to the mailing list.
What does this mean? We can talk about it in terms of four general areas
1. Fault Tolerance
1. Query Speed
1. Data Ingestion
1. Query Flexibility
## Fault Tolerance
Druid pulls segments down from [[Deep Storage]] before serving queries on top of it. This means that for the data to exist in the Druid cluster, it must exist as a local copy on a historical node. If deep storage becomes unavailable for any reason, new segments will not be loaded into the system, but the cluster will continue to operate exactly as it was when the backing store disappeared.
Impala and Shark, on the other hand, pull their data in from HDFS (or some other Hadoop FileSystem) in response to a query. This has implications for the operation of queries if you need to take HDFS down for a bit (say a software upgrade). It's possible that data that has been cached in the nodes is still available when the backing file system goes down, but I'm not sure.
This is just one example, but Druid was built to continue operating in the face of failures of any one of its various pieces. The [[Design]] describes these design decisions from the Druid side in more detail.
## Query Speed
Druid takes control of the data given to it, storing it in a column-oriented fashion, compressing it and adding indexing structures. All of which add to the speed at which queries can be processed. The column orientation means that we only look at the data that a query asks for in order to compute the answer. Compression increases the data storage capacity of RAM and allows us to fit more data into quickly accessible RAM. Indexing structures mean that as you add boolean filters to your queries, we do less processing and you get your result faster, whereas a lot of processing engines do *more* processing when filters are added.
Impala/Shark can basically be thought of as daemon caching layers on top of HDFS. They are processes that stay on even if there is no query running (eliminating the JVM startup costs from Hadoop MapReduce) and they have facilities to cache data locally so that it can be accessed and updated quicker. But, I do not believe they go beyond caching capabilities to actually speed up queries. So, at the end of the day, they don't change the paradigm from a brute-force, scan-everything query processing paradigm.
## Data Ingestion
Druid is built to allow for real-time ingestion of data. You can ingest data and query it immediately upon ingestion, the latency between how quickly the event is reflected in the data is dominated by how long it takes to deliver the event to Druid.
Impala/Shark, being based on data in HDFS or some other backing store, are limited in their data ingestion rates by the rate at which that backing store can make data available. Generally, the backing store is the biggest bottleneck for how quickly data can become available.
## Query Flexibility
Druid supports timeseries and groupBy style queries. It doesn't have support for joins, which makes it a lot less flexible for generic processing.
Impala/Shark support SQL style queries with full joins.
\ No newline at end of file
###How does Druid compare to Redshift?
In terms of drawing a differentiation, Redshift is essentially ParAccel (Actian) which Amazon is licensing.
Aside from potential performance differences, there are some functional differences:
###Real-time data ingestion
Because Druid is optimized to provide insight against massive quantities of streaming data; it is able to load and aggregate data in real-time.
Generally traditional data warehouses including column stores work only with batch ingestion and are not optimal for streaming data in regularly.
###Druid is a read oriented analytical data store
It’s write semantics aren’t as fluid and does not support joins. ParAccel is a full database with SQL support including joins and insert/update statements.
###Data distribution model
Druid’s data distribution, is segment based which exists on highly available “deep” storage, like S3 or HDFS. Scaling up (or down) does not require massive copy actions or downtime; in fact, losing any number of compute nodes does not result in data loss because new compute nodes can always be brought up by reading data from “deep” storage.
To contrast, ParAccel’s data distribution model is hash-based. Expanding the cluster requires re-hashing the data across the nodes, making it difficult to perform without taking downtime. Amazon’s Redshift works around this issue with a multi-step process:
* set cluster into read-only mode
* copy data from cluster to new cluster that exists in parallel
* redirect traffic to new cluster
###Replication strategy
Druid employs segment-level data distribution meaning that more nodes can be added and rebalanced without having to perform a staged swap. The replication strategy also makes all replicas available for querying.
ParAccel’s hash-based distribution generally means that replication is conducted via hot spares. This puts a numerical limit on the number of nodes you can lose without losing data, and this replication strategy often does not allow the hot spare to help share query load.
###Indexing strategy
Along with column oriented structures, Druid uses indexing structures to speed up query execution when a filter is provided. Indexing structures do increase storage overhead (and make it more difficult to allow for mutation), but they can also significantly speed up queries.
ParAccel does not appear to employ indexing strategies.
\ No newline at end of file
How does Druid compare to Vertica?
Vertica is similar to ParAccel/Redshift ([[Druid-vs-Redshift]]) described above in that it wasn’t built for real-time streaming data ingestion and it supports full SQL.
The other big difference is that instead of employing indexing, Vertica tries to optimize processing by leveraging run-length encoding (RLE) and other compression techniques along with a “projection” system that creates materialized copies of the data in a different sort order (to maximize the effectiveness of RLE).
We are unclear about how Vertica handles data distribution and replication, so we cannot speak to if/how Druid is different.
Examples
========
The examples on this page are setup in order to give you a feel for what Druid does in practice. They are quick demos of Druid based on [RealtimeStandaloneMain](https://github.com/metamx/druid/blob/master/examples/src/main/java/druid/examples/RealtimeStandaloneMain.java). While you wouldn’t run it this way in production you should be able to see how ingestion works and the kind of exploratory queries that are possible. Everything that can be done on your box here can be scaled out to 10’s of billions of events and terabytes of data per day in a production cluster while still giving the snappy responsive exploratory queries.
Installing Standalone Druid
---------------------------
There are two options for installing standalone Druid. Building from source, and downloading the Druid Standalone Kit (DSK).
### Building from source
Clone Druid and build it:
<code>git clone https://github.com/metamx/druid.git druid
cd druid
git fetch --tags
git checkout druid-0.4.30
./build.sh
</code>
### Downloading the DSK (Druid Standalone Kit)
[Download](http://static.druid.io/data/examples/druid-services-0.4.6.tar.gz) a stand-alone tarball and run it:
<code>
tar -xzf druid-services-0.X.X-SNAPSHOT-bin.tar.gz
cd druid-services-0.X.X-SNAPSHOT
</code>
Twitter Example
---------------
For a full tutorial based on the twitter example, check out this [[Twitter Tutorial]].
This Example uses a feature of Twitter that allows for sampling of it’s stream. We sample the Twitter stream via our [TwitterSpritzerFirehoseFactory](https://github.com/metamx/druid/blob/master/examples/src/main/java/druid/examples/twitter/TwitterSpritzerFirehoseFactory.java) class and use it to simulate the kinds of data you might ingest into Druid. Then, with the client part, the sample shows what kinds of analytics explorations you can do during and after the data is loaded.
### What you’ll learn
\* See how large amounts of data gets ingested into Druid in real-time
\* Learn how to do fast, interactive, analytics queries on that real-time data
### What you need
\* A build of standalone Druid with the Twitter example (see above)
\* A Twitter username and password.
### What you’ll do
See [[Tutorial]]
Rand Example
------------
This uses `RandomFirehoseFactory` which emits a stream of random numbers (outColumn, a positive double) with timestamps along with an associated token (target). This provides a timeseries that requires no network access for demonstration, characterization, and testing. The generated tuples can be thought of as asynchronously produced triples (timestamp, outColumn, target) where the timestamp varies depending on speed of processing.
In a terminal window, (NOTE: If you are using the cloned Github repository these scripts are in ./examples/bin) start the server with:
`./run_example_server.sh`
`# type rand when prompted`
In another terminal window:
`./run_example_client.sh`
`# type rand when prompted`
The result of the client query is in JSON format. The client makes a REST request using the program `curl` which is usually installed on Linux, Unix, and OSX by default.
A filter is a JSON object indicating which rows of data should be included in the computation for a query. It’s essentially the equivalent of the WHERE clause in SQL. Druid supports the following types of filters.
### Selector filter
The simplest filter is a selector filter. The selector filter will match a specific dimension with a specific value. Selector filters can be used as the base filters for more complex Boolean expressions of filters.
The grammar for a SELECTOR filter is as follows:
<code>"filter": {
"type": "selector",
"dimension": <dimension_string>,
"value": <dimension_value_string>
}
</code>
This is the equivalent of `WHERE <dimension_string> = '<dimension_value_string>'`.
### Regular expression filter
The regular expression filter is similar to the selector filter, but using regular expressions. It matches the specified dimension with the given pattern. The pattern can be any standard [Java regular expression](http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html).
<code>"filter": {
"type": "regex",
"dimension": <dimension_string>,
"pattern": <pattern_string>
}
</code>
### Logical expression filters
#### AND
The grammar for an AND filter is as follows:
<code>"filter": {
"type": "and",
"fields": [<filter>, <filter>, ...]
}
</code>
The filters in fields can be any other filter defined on this page.
#### OR
The grammar for an OR filter is as follows:
<code>"filter": {
"type": "or",
"fields": [<filter>, <filter>, ...]
}
</code>
The filters in fields can be any other filter defined on this page.
#### NOT
The grammar for a NOT filter is as follows:
<code>"filter": {
"type": "not",
"field": <filter>
}
</code>
The filter specified at field can be any other filter defined on this page.
### JavaScript filter
The JavaScript filter matches a dimension against the specified JavaScript function predicate. The filter matches values for which the function returns true.
The function takes a single argument, the dimension value, and returns either true or false.
<code>{
"type" : "javascript",
"dimension" : <dimension_string>,
"function" : "function(value) { <...> }"
}
</code>
**Example**
The following matches any dimension values for the dimension `name` between `'bar'` and `'foo'`
<code>{
"type" : "javascript",
"dimension" : "name",
"function" : "function(x) { return(x >= 'bar' && x <= 'foo') }"
}
</code>
Firehoses describe the data stream source. They are pluggable and thus the configuration schema can and will vary based on the `type` of the firehose.
|Field|Type|Description|Required|
|-----|----|-----------|--------|
|type|String|Specifies the type of firehose. Each value will have its own configuration schema, firehoses packaged with Druid are described [here](https://github.com/metamx/druid/wiki/Firehose#available-firehoses)|yes|
We describe the configuration of the Kafka firehose from the example below, but check [here](https://github.com/metamx/druid/wiki/Firehose#available-firehoses) for more information about the various firehoses that are available in Druid.
- `consumerProps` is a map of properties for the Kafka consumer. The JSON object is converted into a Properties object and passed along to the Kafka consumer.
- `feed` is the feed that the Kafka consumer should read from.
- `parser` represents a parser that knows how to convert from String representations into the required `InputRow` representation that Druid uses. This is a potentially reusable piece that can be found in many of the firehoses that are based on text streams. The spec in the example describes a JSON feed (new-line delimited objects), with a timestamp column called “timestamp” in ISO8601 format and that it should not include the dimension “value” when processing. More information about the options available for the parser are available [here](https://github.com/metamx/druid/wiki/Firehose#parsing-data).
Available Firehoses
-------------------
There are several firehoses readily available in Druid, some are meant for examples, others can be used directly in a production environment.
#### KafkaFirehose
This firehose acts as a Kafka consumer and ingests data from Kafka.
#### StaticS3Firehose
This firehose ingests events from a predefined list of S3 objects.
#### TwitterSpritzerFirehose
See [[Examples]]. This firehose connects directly to the twitter spritzer data stream.
#### RandomFirehose
See [[Examples]]. This firehose creates a stream of random numbers.
#### RabbitMqFirehouse
This firehose ingests events from a define rabbit-mq queue.
Parsing Data
------------
There are several ways to parse data.
#### StringInputRowParser
This parser converts Strings.
#### MapInputRowParser
This parser converts flat, key/value pair maps.
The granularity field determines how data gets bucketed across the time dimension, i.e how it gets aggregated by hour, day, minute, etc.
It can be specified either as a string for simple granularities or as an object for arbitrary granularities.
### Simple Granularities
Simple granularities are specified as a string and bucket timestamps by their UTC time (i.e. days start at 00:00 UTC).
Supported granularity strings are: `all`, `none`, `minute`, `fifteen_minute`, `thirty_minute`, `hour` and `day`
\* **`all`** buckets everything into a single bucket
\* **`none`** does not bucket data (it actually uses the granularity of the index - minimum here is `none` which means millisecond granularity). Using `none` in a [[timeseries query|TimeSeriesQuery]] is currently not recommended (the system will try to generate 0 values for all milliseconds that didn’t exist, which is often a lot).
### Duration Granularities
Duration granularities are specified as an exact duration in milliseconds and timestamps are returned as UTC.
They also support specifying an optional origin, which defines where to start counting time buckets from (defaults to 1970-01-01T00:00:00Z).
<code>{"type": "duration", "duration": "7200000"}</code>
This chunks up every 2 hours.
<code>{"type": "duration", "duration": "3600000", "origin": "2012-01-01T00:30:00Z"}</code>
This chunks up every hour on the half-hour.
### Period Granularities
Period granularities are specified as arbitrary period combinations of years, months, weeks, hours, minutes and seconds (e.g. P2W, P3M, PT1H30M, PT0.750S) in ISO8601 format.
They support specifying a time zone which determines where period boundaries start and also determines the timezone of the returned timestamps.
By default years start on the first of January, months start on the first of the month and weeks start on Mondays unless an origin is specified.
Time zone is optional (defaults to UTC)
Origin is optional (defaults to 1970-01-01T00:00:00 in the given time zone)
<code>{"type": "period", "period": "P2D", "timeZone": "America/Los_Angeles"}</code>
This will bucket by two day chunks in the Pacific timezone.
<code>{"type": "period", "period": "P3M", "timeZone": "America/Los_Angeles", "origin": "2012-02-01T00:00:00-08:00"}</code>
This will bucket by 3 month chunks in the Pacific timezone where the three-month quarters are defined as starting from February.
Supported time zones: timezone support is provided by the [Joda Time library](http://www.joda.org), which uses the standard IANA time zones. [Joda Time supported timezones](http://joda-time.sourceforge.net/timezones.html)
These types of queries take a groupBy query object and return an array of JSON objects where each object represents a grouping asked for by the query.
An example groupBy query object is shown below:
<pre>
<code>
{
[queryType]() “groupBy”,
[dataSource]() “sample\_datasource”,
[granularity]() “day”,
[dimensions]() [“dim1”, “dim2”],
[limitSpec]() {
[type]() “default”,
[limit]() 5000,
[columns]() [“dim1”, “metric1”]
},
[filter]() {
[type]() “and”,
[fields]() [
{
[type]() “selector”,
[dimension]() “sample\_dimension1”,
[value]() “sample\_value1”
},
{
[type]() “or”,
[fields]() [
{
[type]() “selector”,
[dimension]() “sample\_dimension2”,
[value]() “sample\_value2”
},
{
[type]() “selector”,
[dimension]() “sample\_dimension3”,
[value]() “sample\_value3”
}
]
}
]
},
[aggregations]() [
{
[type]() “longSum”,
[name]() “sample\_name1”,
[fieldName]() “sample\_fieldName1”
},
{
[type]() “doubleSum”,
[name]() “sample\_name2”,
[fieldName]() “sample\_fieldName2”
}
],
[postAggregations]() [
{
[type]() “arithmetic”,
[name]() “sample\_divide”,
[fn]() “/”,
[fields]() [
{
[type]() “fieldAccess”,
[name]() “sample\_name1”,
[fieldName]() “sample\_fieldName1”
},
{
[type]() “fieldAccess”,
[name]() “sample\_name2”,
[fieldName]() “sample\_fieldName2”
}
]
}
],
[intervals]() [
“2012-01-01T00:00:00.000/2012-01-03T00:00:00.000”
],
[having]() {
[type]() “greaterThan”,
[aggregation]() “sample\_name1”,
[value]() 0
}
}
</pre>
</code>
There are 9 main parts to a groupBy query:
|property|description|required?|
|--------|-----------|---------|
|queryType|This String should always be “groupBy”; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String defining the data source to query, very similar to a table in a relational database|yes|
|dimensions|A JSON list of dimensions to do the groupBy over|yes|
|orderBy|See [[OrderBy]].|no|
|having|See [[Having]].|no|
|granularity|Defines the granularity of the query. See [[Granularities]]|yes|
|filter|See [[Filters]]|no|
|aggregations|See [[Aggregations]]|yes|
|postAggregations|See [[Post Aggregations]]|no|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|context|An additional JSON Object which can be used to specify certain flags.|no|
To pull it all together, the above query would return *n\*m* data points, up to a maximum of 5000 points, where n is the cardinality of the “dim1” dimension, m is the cardinality of the “dim2” dimension, each day between 2012-01-01 and 2012-01-03, from the “sample\_datasource” table. Each data point contains the (long) sum of sample\_fieldName1 if the value of the data point is greater than 0, the (double) sum of sample\_fieldName2 and the (double) the result of sample\_fieldName1 divided by sample\_fieldName2 for the filter set for a particular grouping of “dim1” and “dim2”. The output looks like this:
<pre>
<code>
[ {
“version” : “v1”,
“timestamp” : “2012-01-01T00:00:00.000Z”,
“event” : {
“dim1” : <some_dim1_value>,
“dim2” : <some_dim2_value>,
“sample\_name1” : <some_sample_name1_value>,
“sample\_name2” :<some_sample_name2_value>,
“sample\_divide” : <some_sample_divide_value>
}
}, {
“version” : “v1”,
“timestamp” : “2012-01-01T00:00:00.000Z”,
“event” : {
“dim1” : <some_other_dim1_value>,
“dim2” : <some_other_dim2_value>,
“sample\_name1” : <some_other_sample_name1_value>,
“sample\_name2” :<some_other_sample_name2_value>,
“sample\_divide” : <some_other_sample_divide_value>
}
},
]
</pre>
</code>
A having clause is a JSON object identifying which rows from a groupBy query should be returned, by specifying conditions on aggregated values.
It is essentially the equivalent of the HAVING clause in SQL.
Druid supports the following types of having clauses.
### Numeric filters
The simplest having clause is a numeric filter.
Numeric filters can be used as the base filters for more complex boolean expressions of filters.
#### Equal To
The equalTo filter will match rows with a specific aggregate value.
The grammar for an `equalTo` filter is as follows:
<code>"having": {
"type": "equalTo",
"aggregation": <aggregate_metric>,
"value": <numeric_value>
}
</code>
This is the equivalent of `HAVING <aggregate> = <value>`.
#### Greater Than
The greaterThan filter will match rows with aggregate values greater than the given value.
The grammar for a `greaterThan` filter is as follows:
<code>"having": {
"type": "greaterThan",
"aggregation": <aggregate_metric>,
"value": <numeric_value>
}
</code>
This is the equivalent of `HAVING <aggregate> > <value>`.
#### Less Than
The lessThan filter will match rows with aggregate values less than the specified value.
The grammar for a `greaterThan` filter is as follows:
<code>"having": {
"type": "lessThan",
"aggregation": <aggregate_metric>,
"value": <numeric_value>
}
</code>
This is the equivalent of `HAVING <aggregate> < <value>`.
### Logical expression filters
#### AND
The grammar for an AND filter is as follows:
<code>"having": {
"type": "and",
"havingSpecs": [<having clause>, <having clause>, ...]
}
</code>
The having clauses in `havingSpecs` can be any other having clause defined on this page.
#### OR
The grammar for an OR filter is as follows:
<code>"having": {
"type": "or",
"havingSpecs": [<having clause>, <having clause>, ...]
}
</code>
The having clauses in `havingSpecs` can be any other having clause defined on this page.
#### NOT
The grammar for a NOT filter is as follows:
<code>"having": {
"type": "not",
"havingSpec": <having clause>
}
</code>
The having clause specified at `havingSpec` can be any other having clause defined on this page.
Druid is an open-source analytics datastore designed for realtime, exploratory, queries on large-scale data sets (100’s of Billions entries, 100’s TB data). Druid provides for cost effective, always-on, realtime data ingestion and arbitrary data exploration.
- Check out some [[Examples]]
- Try out Druid with our Getting Started [Tutorial](https://github.com/metamx/druid/wiki/Tutorial%3A-A-First-Look-at-Druid)
- Learn more by reading the [White Paper](http://static.druid.io/docs/druid.pdf)
Why Druid?
----------
Druid was originally created to resolve query latency issues seen with trying to use Hadoop to power an interactive service. Hadoop has shown the world that it’s possible to house your data warehouse on commodity hardware for a fraction of the price of typical solutions. As people adopt Hadoop for their data warehousing needs, they find two things.
1. They can now query all of their data in a fairly flexible manner and answer any question they have
2. The queries take a long time
The first one is the joy that everyone feels the first time they get Hadoop running. The latter is what they realize after they have used Hadoop interactively for a while because Hadoop is optimized for throughput, not latency. Druid is a system that you can set up in your organization next to Hadoop. It provides the ability to access your data in an interactive slice-and-dice fashion. It trades off some query flexibility and takes over the storage format in order to provide the speed.
Druid is especially useful if you are summarizing your data sets and then querying the summarizations. If you put your summarizations into Druid, you will get quick queryability out of a system that you can be confident will scale up as your data volumes increase. Deployments have scaled up to 2TB of data per hour at peak ingested and aggregated in real-time.
We have more details about the general design of the system and why you might want to use it in our [White Paper](http://static.druid.io/docs/druid.pdf) or in our [[Design]] doc.
The data store world is vast, confusing and constantly in flux. This page is meant to help potential evaluators decide whether Druid is a good fit for the problem one needs to solve. If anything about it is incorrect please provide that feedback on the mailing list or via some other means, we will fix this page.
#### When Druid?
\* You need to do interactive, fast, exploration of large amounts of data
\* You need analytics (not key value store)
\* You have a lot of data (10s of Billions of events added per day, 10s of TB of data added per day)
\* You want to do your analysis on data as it’s happening (realtime)
\* Your store needs to be always-on, 24x7x365 and years into the future.
#### Not Druid?
\* The amount of data you have can easily be handled by MySql
\* Your querying for individual entries or doing lookups (Not Analytics)
\* Batch is good enough
\* Canned queries is good enough
\* Downtime is no big deal
#### Druid vs…
\* [[Druid-vs-Impala-or-Shark]]
\* [[Druid-vs-Redshift]]
\* [[Druid-vs-Vertica]]
\* [[Druid-vs-Cassandra]]
\* [[Druid-vs-Hadoop]]
Key Features
------------
- **Designed for Analytics** - Druid is built for exploratory analytics for OLAP workflows (streamalytics). It supports a variety of filters, aggregators and query types and provides a framework for plugging in new functionality. Users have leveraged Druid’s infrastructure to develop features such as top K queries and histograms.
- **Interactive Queries** - Druid’s low latency data ingestion architecture allows events to be queried milliseconds after they are created. Druid’s query latency is optimized by only reading and scanning exactly what is needed. Aggregate and filter on data without sitting around waiting for results.
- **Highly Available** - Druid is used to back SaaS implementations that need to be up all the time. Your data is still available and queryable during system updates. Scale up or down without data loss.
- **Scalable** - Existing Druid deployments handle billions of events and terabytes of data per day. Druid is designed to be petabyte scale.
Disclaimer: We are still in the process of finalizing the indexing service and these configs are prone to change at any time. We will announce when we feel the indexing service and the configurations described are stable.
The indexing service is a distributed task/job queue. It accepts requests in the form of [[Tasks]] and executes those tasks across a set of worker nodes. Worker capacity can be automatically adjusted based on the number of tasks pending in the system. The indexing service is highly available, has built in retry logic, and can backup per task logs in deep storage.
The indexing service is composed of two main components, a coordinator node that manages task distribution and worker capacity, and worker nodes that execute tasks in separate JVMs.
Preamble
--------
The truth is, the indexing service is an experience that is difficult to characterize with words. When they asked me to write this preamble, I was taken aback. I wasn’t quite sure what exactly to write or how to describe this… entity. I accepted the job, as much for the challenge and inner growth as the money, and took to the mountains for reflection. Six months later, I knew I had it, I was done and had achieved the next euphoric victory in the continuous struggle that plagues my life. But, enough about me. This is about the indexing service.
The indexing service is philosophical transcendence, an infallible truth that will shape your soul, mold your character, and define your reality. The indexing service is creating world peace, playing with puppies, unwrapping presents on Christmas morning, cradling a loved one, and beating Goro in Mortal Kombat for the first time. The indexing service is sustainable economic growth, global propensity, and a world of transparent financial transactions. The indexing service is a true belieber. The indexing service is panicking because you forgot you signed up for a course and the big exam is in a few minutes, only to wake up and realize it was all a dream. What is the indexing service? More like what isn’t the indexing service. The indexing service is here and it is ready, but are you?
Indexer Coordinator Node
------------------------
The indexer coordinator node exposes HTTP endpoints where tasks can be submitted by posting a JSON blob to specific endpoints. It can be started by launching IndexerCoordinatorMain.java. The indexer coordinator node can operate in local mode or remote mode. In local mode, the coordinator and worker run on the same host and port. In remote mode, worker processes run on separate hosts and ports.
Tasks can be submitted via POST requests to:
http://<COORDINATOR_IP>:<port>/druid/indexer/v1/task
Tasks can cancelled via POST requests to:
http://<COORDINATOR_IP>:<port>/druid/indexer/v1/task/{taskId}/shutdown
Issuing the cancel request once sends a graceful shutdown request. Graceful shutdowns may not stop a task right away, but instead issue a safe stop command at a point deemed least impactful to the system. Issuing the cancel request twice in succession will kill –9 the task.
Task statuses can be retrieved via GET requests to:
http://<COORDINATOR_IP>:<port>/druid/indexer/v1/task/{taskId}/status
Task segments can be retrieved via GET requests to:
http://<COORDINATOR_IP>:<port>/druid/indexer/v1/task/{taskId}/segments
When a task is submitted, the coordinator creates a lock over the data source and interval of the task. The coordinator also stores the task in a MySQL database table. The database table is read at startup time to bootstrap any tasks that may have been submitted to the coordinator but may not yet have been executed.
The coordinator also exposes a simple UI to show what tasks are currently running on what nodes at
http://<COORDINATOR_IP>:<port>/static/console.html
#### Task Execution
The coordinator retrieves worker setup metadata from the Druid [[MySQL]] config table. This metadata contains information about the version of workers to create, the maximum and minimum number of workers in the cluster at one time, and additional information required to automatically create workers.
Tasks are assigned to workers by creating entries under specific /tasks paths associated with a worker, similar to how the Druid master node assigns segments to compute nodes. See [Worker Configuration](Indexing-Service#configuration-1). Once a worker picks up a task, it deletes the task entry and announces a task status under a /status path associated with the worker. Tasks are submitted to a worker until the worker hits capacity. If all workers in a cluster are at capacity, the indexer coordinator node automatically creates new worker resources.
#### Autoscaling
The Autoscaling mechanisms currently in place are tightly coupled with our deployment infrastructure but the framework should be in place for other implementations. We are highly open to new implementations or extensions of the existing mechanisms. In our own deployments, worker nodes are Amazon AWS EC2 nodes and they are provisioned to register themselves in a [galaxy](https://github.com/ning/galaxy) environment.
The Coordinator node controls the number of workers in the cluster according to a worker setup spec that is submitted via a POST request to the indexer at:
http://<COORDINATOR_IP>:<port>/druid/indexer/v1/worker/setup
A sample worker setup spec is shown below:
<code>{
"minVersion":"some_version",
"minNumWorkers":"0",
"maxNumWorkers":"10",
"nodeData": {
"type":"ec2",
"amiId":"ami-someId",
"instanceType":"m1.xlarge",
"minInstances":"1",
"maxInstances":"1",
"securityGroupIds":["securityGroupIds"],
"keyName":"keyName"
},
"userData":{
"classType":"galaxy",
"env":"druid",
"version":"druid_version",
"type":"sample_cluster/worker"
}
}
</code>
Issuing a GET request at the same URL will return the current worker setup spec that is currently in place. The worker setup spec list above is just a sample and it is possible to write worker setup specs for other deployment environments. A description of the worker setup spec is shown below.
|Property|Description|Default|
|--------|-----------|-------|
|`minVersion`|The coordinator only assigns tasks to workers with a version greater than the minVersion. If this is not specified, the minVersion will be the same as the coordinator version.|none|
|`minNumWorkers`|The minimum number of workers that can be in the cluster at any given time.|0|
|`maxNumWorkers`|The maximum number of workers that can be in the cluster at any given time.|0|
|`nodeData`|A JSON object that contains metadata about new nodes to create.|none|
|`userData`|A JSON object that contains metadata about how the node should register itself on startup. This data is sent with node creation requests.|none|
For more information about configuring Auto-scaling, see [Auto-Scaling Configuration](https://github.com/metamx/druid/wiki/Indexing-Service#auto-scaling-configuration).
#### Running
Indexer Coordinator nodes can be run using the `com.metamx.druid.indexing.coordinator.http.IndexerCoordinatorMain` class.
#### Configuration
Indexer Coordinator nodes require [basic service configuration](https://github.com/metamx/druid/wiki/Configuration#basic-service-configuration). In addition, there are several extra configurations that are required.
-Ddruid.zk.paths.indexer.announcementsPath=/druid/indexer/announcements
-Ddruid.zk.paths.indexer.leaderLatchPath=/druid/indexer/leaderLatchPath
-Ddruid.zk.paths.indexer.statusPath=/druid/indexer/status
-Ddruid.zk.paths.indexer.tasksPath=/druid/demo/indexer/tasks
-Ddruid.indexer.runner=remote
-Ddruid.indexer.taskDir=/mnt/persistent/task/
-Ddruid.indexer.configTable=sample_config
-Ddruid.indexer.workerSetupConfigName=worker_setup
-Ddruid.indexer.strategy=ec2
-Ddruid.indexer.hadoopWorkingPath=/tmp/druid-indexing
-Ddruid.indexer.logs.s3bucket=some_bucket
-Ddruid.indexer.logs.s3prefix=some_prefix
The indexing service requires some additional Zookeeper configs.
|Property|Description|Default|
|--------|-----------|-------|
|`druid.zk.paths.indexer.announcementsPath`|The base path where workers announce themselves.|none|
|`druid.zk.paths.indexer.leaderLatchPath`|The base that coordinator nodes use to determine a leader.|none|
|`druid.zk.paths.indexer.statusPath`|The base path where workers announce task statuses.|none|
|`druid.zk.paths.indexer.tasksPath`|The base path where the coordinator assigns new tasks.|none|
There’s several additional configs that are required to run tasks.
|Property|Description|Default|
|--------|-----------|-------|
|`druid.indexer.runner`|Indicates whether tasks should be run locally or in a distributed environment. “local” or “remote”.|local|
|`druid.indexer.taskDir`|Intermediate temporary directory that tasks may use.|none|
|`druid.indexer.configTable`|The MySQL config table where misc configs live.|none|
|`druid.indexer.strategy`|The autoscaling strategy to use.|noop|
|`druid.indexer.hadoopWorkingPath`|Intermediate temporary hadoop working directory that certain index tasks may use.|none|
|`druid.indexer.logs.s3bucket`|S3 bucket to store logs.|none|
|`druid.indexer.logs.s3prefix`|S3 key prefix to store logs.|none|
#### Console
The indexer console can be used to view pending tasks, running tasks, available workers, and recent worker creation and termination. The console can be accessed at:
http://<COORDINATOR_IP>:8080/static/console.html
Worker Node
-----------
The worker node executes submitted tasks. Workers run tasks in separate JVMs.
#### Running
Worker nodes can be run using the `com.metamx.druid.indexing.worker.http.WorkerMain` class. Worker nodes can automatically be created by the Indexer Coordinator as part of autoscaling.
#### Configuration
Worker nodes require [basic service configuration](https://github.com/metamx/druid/wiki/Configuration#basic-service-configuration). In addition, there are several extra configurations that are required.
-Ddruid.worker.version=0
-Ddruid.worker.capacity=3
-Ddruid.indexer.threads=3
-Ddruid.indexer.taskDir=/mnt/persistent/task/
-Ddruid.indexer.hadoopWorkingPath=/tmp/druid-indexing
-Ddruid.worker.masterService=druid:sample_cluster:indexer
-Ddruid.indexer.fork.hostpattern=<IP>:%d
-Ddruid.indexer.fork.startport=8080
-Ddruid.indexer.fork.main=com.metamx.druid.indexing.worker.executor.ExecutorMain
-Ddruid.indexer.fork.opts="-server -Xmx1g -Xms1g -XX:NewSize=256m -XX:MaxNewSize=256m"
-Ddruid.indexer.fork.property.druid.service=druid/sample_cluster/executor
# These configs are the same configs you would set for basic service configuration, just with a different prefix
-Ddruid.indexer.fork.property.druid.monitoring.monitorSystem=false
-Ddruid.indexer.fork.property.druid.computation.buffer.size=268435456
-Ddruid.indexer.fork.property.druid.indexer.taskDir=/mnt/persistent/task/
-Ddruid.indexer.fork.property.druid.processing.formatString=processing-%s
-Ddruid.indexer.fork.property.druid.processing.numThreads=1
-Ddruid.indexer.fork.property.druid.server.maxSize=0
-Ddruid.indexer.fork.property.druid.request.logging.dir=request_logs/
Many of the configurations for workers are similar to those for basic service configuration":https://github.com/metamx/druid/wiki/Configuration\#basic-service-configuration, but with a different config prefix. Below we describe the unique worker configs.
|Property|Description|Default|
|--------|-----------|-------|
|`druid.worker.version`|Version identifier for the worker.|0|
|`druid.worker.capacity`|Maximum number of tasks the worker can accept.|1|
|`druid.indexer.threads`|Number of processing threads per worker.|1|
|`druid.worker.masterService`|Name of the indexer coordinator used for service discovery.|none|
|`druid.indexer.fork.hostpattern`|The format of the host name.|none|
|`druid.indexer.fork.startport`|Port in which child JVM starts from.|none|
|`druid.indexer.fork.opts`|JVM options for child JVMs.|none|
### R
- [RDruid](https://github.com/metamx/RDruid) - Druid connector for R
Community Libraries
-------------------
Some great folks have written their own libraries to interact with Druid
#### Ruby
\* [madvertise/ruby-druid](https://github.com/madvertise/ruby-druid) - A ruby client for Druid
#### Helper Libraries
- [madvertise/druid-dumbo](https://github.com/madvertise/druid-dumbo) - Scripts to help generate batch configs for the ingestion of data into Druid
- [housejester/druid-test-harness](https://github.com/housejester/druid-test-harness) - A set of scripts to simplify standing up some servers and seeing how things work
Once you have a realtime node working, it is time to load your own data to see how Druid performs.
Druid can ingest data in three ways: via Kafka and a realtime node, via the indexing service, and via the Hadoop batch loader. Data is ingested in realtime using a [[Firehose]].
## Create Config Directories ##
Each type of node needs its own config file and directory, so create them as subdirectories under the druid directory.
```bash
mkdir config
mkdir config/realtime
mkdir config/master
mkdir config/compute
mkdir config/broker
```
## Loading Data with Kafka ##
[KafkaFirehoseFactory](https://github.com/metamx/druid/blob/master/realtime/src/main/java/com/metamx/druid/realtime/firehose/KafkaFirehoseFactory.java) is how druid communicates with Kafka. Using this [[Firehose]] with the right configuration, we can import data into Druid in realtime without writing any code. To load data to a realtime node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [[Realtime]] node.
### Booting Kafka ###
Instructions for booting a Zookeeper and then Kafka cluster are available [here](http://kafka.apache.org/07/quickstart.html).
1. Download Apache Kafka 0.7.2 from [http://kafka.apache.org/downloads.html](http://kafka.apache.org/downloads.html)
```bash
wget http://apache.spinellicreations.com/incubator/kafka/kafka-0.7.2-incubating/kafka-0.7.2-incubating-src.tgz
tar -xvzf kafka-0.7.2-incubating-src.tgz
cd kafka-0.7.2-incubating-src
```
2. Build Kafka
```bash
./sbt update
./sbt package
```
3. Boot Kafka
```bash
cat config/zookeeper.properties
bin/zookeeper-server-start.sh config/zookeeper.properties
# in a new console
bin/kafka-server-start.sh config/server.properties
```
4. Launch the console producer (so you can type in JSON kafka messages in a bit)
```bash
bin/kafka-console-producer.sh --zookeeper localhost:2181 --topic druidtest
```
### Launching a Realtime Node
1. Create a valid configuration file similar to this called config/realtime/runtime.properties:
```
druid.host=0.0.0.0:8080
druid.port=8080
com.metamx.emitter.logging=true
druid.processing.formatString=processing_%s
druid.processing.numThreads=1
druid.processing.buffer.sizeBytes=10000000
#emitting, opaque marker
druid.service=example
druid.request.logging.dir=/tmp/example/log
druid.realtime.specFile=realtime.spec
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
# below are dummy values when operating a realtime only node
druid.processing.numThreads=3
com.metamx.aws.accessKey=dummy_access_key
com.metamx.aws.secretKey=dummy_secret_key
druid.pusher.s3.bucket=dummy_s3_bucket
druid.zk.service.host=localhost
druid.server.maxSize=300000000000
druid.zk.paths.base=/druid
druid.database.segmentTable=prod_segments
druid.database.user=user
druid.database.password=diurd
druid.database.connectURI=
druid.host=127.0.0.1:8080
```
2. Create a valid realtime configuration file similar to this called realtime.spec:
```json
[{
"schema" : { "dataSource":"druidtest",
"aggregators":[ {"type":"count", "name":"impressions"},
{"type":"doubleSum","name":"wp","fieldName":"wp"}],
"indexGranularity":"minute",
"shardSpec" : { "type": "none" } },
"config" : { "maxRowsInMemory" : 500000,
"intermediatePersistPeriod" : "PT10m" },
"firehose" : { "type" : "kafka-0.7.2",
"consumerProps" : { "zk.connect" : "localhost:2181",
"zk.connectiontimeout.ms" : "15000",
"zk.sessiontimeout.ms" : "15000",
"zk.synctime.ms" : "5000",
"groupid" : "topic-pixel-local",
"fetch.size" : "1048586",
"autooffset.reset" : "largest",
"autocommit.enable" : "false" },
"feed" : "druidtest",
"parser" : { "timestampSpec" : { "column" : "utcdt", "format" : "iso" },
"data" : { "format" : "json" },
"dimensionExclusions" : ["wp"] } },
"plumber" : { "type" : "realtime",
"windowPeriod" : "PT10m",
"segmentGranularity":"hour",
"basePersistDirectory" : "/tmp/realtime/basePersist",
"rejectionPolicy": {"type": "messageTime"} }
}]
```
3. Launch the realtime node
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-Ddruid.realtime.specFile=config/realtime/realtime.spec \
-classpath lib/*:config/realtime com.metamx.druid.realtime.RealtimeMain
```
4. Paste data into the Kafka console producer
```json
{"utcdt": "2010-01-01T01:01:01", "wp": 1000, "gender": "male", "age": 100}
{"utcdt": "2010-01-01T01:01:02", "wp": 2000, "gender": "female", "age": 50}
{"utcdt": "2010-01-01T01:01:03", "wp": 3000, "gender": "male", "age": 20}
{"utcdt": "2010-01-01T01:01:04", "wp": 4000, "gender": "female", "age": 30}
{"utcdt": "2010-01-01T01:01:05", "wp": 5000, "gender": "male", "age": 40}
```
5. Watch the events as they are ingested by Druid's realtime node
```bash
...
2013-06-17 21:41:55,569 INFO [Global--0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2013-06-17T21:41:55.569Z","service":"example","host":"127.0.0.1","metric":"events/processed","value":5,"user2":"druidtest"}]
...
```
6. In a new console, edit a file called query.body:
```json
{
"queryType": "groupBy",
"dataSource": "druidtest",
"granularity": "all",
"dimensions": [],
"aggregations": [
{ "type": "count", "name": "rows" },
{"type": "longSum", "name": "imps", "fieldName": "impressions"},
{"type": "doubleSum", "name": "wp", "fieldName": "wp"}
],
"intervals": ["2010-01-01T00:00/2020-01-01T00"]
}
```
7. Submit the query via curl
```bash
curl -X POST "http://localhost:8080/druid/v2/?pretty" \
-H 'content-type: application/json' -d @query.body
```
8. View Result!
```json
[ {
"timestamp" : "2010-01-01T01:01:00.000Z",
"result" : {
"imps" : 20,
"wp" : 60000.0,
"rows" : 5
}
} ]
```
Now you're ready for [[Querying Your Data]]!
## Loading Data with the HadoopDruidIndexer ##
Historical data can be loaded via a Hadoop job.
The setup for a single node, 'standalone' Hadoop cluster is available at [http://hadoop.apache.org/docs/stable/single_node_setup.html](http://hadoop.apache.org/docs/stable/single_node_setup.html).
### Setup MySQL ###
1. If you don't already have it, download MySQL Community Server here: [http://dev.mysql.com/downloads/mysql/](http://dev.mysql.com/downloads/mysql/)
2. Install MySQL
3. Create a druid user and database
```bash
mysql -u root
```
```sql
GRANT ALL ON druid.* TO 'druid'@'localhost' IDENTIFIED BY 'diurd';
CREATE database druid;
```
The [[Master]] node will create the tables it needs based on its configuration.
### Make sure you have ZooKeeper Running ###
Make sure that you have a zookeeper instance running. If you followed the instructions for Kafka, it is probably running. If you are unsure if you have zookeeper running, try running
```bash
ps auxww | grep zoo | grep -v grep
```
If you get any result back, then zookeeper is most likely running. If you haven't setup Kafka or do not have zookeeper running, then you can download it and start it up with
```bash
curl http://www.motorlogy.com/apache/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz -o zookeeper-3.4.5.tar.gz
tar xzf zookeeper-3.4.5.tar.gz
cd zookeeper-3.4.5
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
cd ..
```
### Launch a Master Node ###
If you've already setup a realtime node, be aware that although you can run multiple node types on one physical computer, you must assign them unique ports. Having used 8080 for the [[Realtime]] node, we use 8081 for the [[Master]].
1. Setup a configuration file called config/master/runtime.properties similar to:
```bash
druid.host=0.0.0.0:8081
druid.port=8081
com.metamx.emitter.logging=true
druid.processing.formatString=processing_%s
druid.processing.numThreads=1
druid.processing.buffer.sizeBytes=10000000
#emitting, opaque marker
druid.service=example
druid.master.startDelay=PT60s
druid.request.logging.dir=/tmp/example/log
druid.realtime.specFile=realtime.spec
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
# below are dummy values when operating a realtime only node
druid.processing.numThreads=3
com.metamx.aws.accessKey=dummy_access_key
com.metamx.aws.secretKey=dummy_secret_key
druid.pusher.s3.bucket=dummy_s3_bucket
druid.zk.service.host=localhost
druid.server.maxSize=300000000000
druid.zk.paths.base=/druid
druid.database.segmentTable=prod_segments
druid.database.user=druid
druid.database.password=diurd
druid.database.connectURI=jdbc:mysql://localhost:3306/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
druid.database.ruleTable=rules
druid.database.configTable=config
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# Path on local FS for storage of segment metadata; dir will be created if needed
druid.paths.segmentInfoCache=/tmp/druid/segmentInfoCache
```
2. Launch the [[Master]] node
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-classpath lib/*:config/master \
com.metamx.druid.http.MasterMain
```
### Launch a Compute/Historical Node ###
1. Create a configuration file in config/compute/runtime.properties similar to:
```bash
druid.host=0.0.0.0:8082
druid.port=8082
com.metamx.emitter.logging=true
druid.processing.formatString=processing_%s
druid.processing.numThreads=1
druid.processing.buffer.sizeBytes=10000000
#emitting, opaque marker
druid.service=example
druid.request.logging.dir=/tmp/example/log
druid.realtime.specFile=realtime.spec
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
# below are dummy values when operating a realtime only node
druid.processing.numThreads=3
com.metamx.aws.accessKey=dummy_access_key
com.metamx.aws.secretKey=dummy_secret_key
druid.pusher.s3.bucket=dummy_s3_bucket
druid.zk.service.host=localhost
druid.server.maxSize=300000000000
druid.zk.paths.base=/druid
druid.database.segmentTable=prod_segments
druid.database.user=druid
druid.database.password=diurd
druid.database.connectURI=jdbc:mysql://localhost:3306/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
druid.database.ruleTable=rules
druid.database.configTable=config
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# Path on local FS for storage of segment metadata; dir will be created if needed
druid.paths.segmentInfoCache=/tmp/druid/segmentInfoCache
# Setup local storage mode
druid.pusher.local.storageDirectory=/tmp/druid/localStorage
druid.pusher.local=true
```
2. Launch the compute node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-classpath lib/*:config/compute \
com.metamx.druid.http.ComputeMain
```
### Create a File of Records ###
We can use the same records we have been, in a file called records.json:
```json
{"utcdt": "2010-01-01T01:01:01", "wp": 1000, "gender": "male", "age": 100}
{"utcdt": "2010-01-01T01:01:02", "wp": 2000, "gender": "female", "age": 50}
{"utcdt": "2010-01-01T01:01:03", "wp": 3000, "gender": "male", "age": 20}
{"utcdt": "2010-01-01T01:01:04", "wp": 4000, "gender": "female", "age": 30}
{"utcdt": "2010-01-01T01:01:05", "wp": 5000, "gender": "male", "age": 40}
```
### Run the Hadoop Job ###
Now its time to run the Hadoop [[Batch-ingestion]] job, HadoopDruidIndexer, which will fill a historical [[Compute]] node with data. First we'll need to configure the job.
1. Create a config called batchConfig.json similar to:
```json
{
"dataSource": "druidtest",
"timestampColumn": "utcdt",
"timestampFormat": "iso",
"dataSpec": {
"format": "json",
"dimensions": ["gender", "age"]
},
"granularitySpec": {
"type":"uniform",
"intervals":["2010-01-01T01/PT1H"],
"gran":"hour"
},
"pathSpec": { "type": "static",
"paths": "/Users/rjurney/Software/druid/records.json" },
"rollupSpec": { "aggs":[ {"type":"count", "name":"impressions"},
{"type":"doubleSum","name":"wp","fieldName":"wp"}
],
"rollupGranularity": "minute"},
"workingPath": "/tmp/working_path",
"segmentOutputPath": "/tmp/segments",
"leaveIntermediate": "false",
"partitionsSpec": {
"targetPartitionSize": 5000000
},
"updaterJobSpec": {
"type":"db",
"connectURI":"jdbc:mysql://localhost:3306/druid",
"user":"druid",
"password":"diurd",
"segmentTable":"prod_segments"
}
}
```
2. Now run the job, with the config pointing at batchConfig.json:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=realtime.spec -classpath lib/* com.metamx.druid.indexer.HadoopDruidIndexerMain batchConfig.json
```
You can now move on to [[Querying Your Data]]!
\ No newline at end of file
Master
======
The Druid master node is primarily responsible for segment management and distribution. More specifically, the Druid master node communicates to compute nodes to load or drop segments based on configurations. The Druid master is responsible for loading new segments, dropping outdated segments, managing segment replication, and balancing segment load.
The Druid master runs periodically and the time between each run is a configurable parameter. Each time the Druid master runs, it assesses the current state of the cluster before deciding on the appropriate actions to take. Similar to the broker and compute nodes, the Druid master maintains a connection to a Zookeeper cluster for current cluster information. The master also maintains a connection to a database containing information about available segments and rules. Available segments are stored in a segment table and list all segments that should be loaded in the cluster. Rules are stored in a rule table and indicate how segments should be handled.
Before any unassigned segments are serviced by compute nodes, the available compute nodes for each tier are first sorted in terms of capacity, with least capacity servers having the highest priority. Unassigned segments are always assigned to the nodes with least capacity to maintain a level of balance between nodes. The master does not directly communicate with a compute node when assigning it a new segment; instead the master creates some temporary information about the new segment under load queue path of the compute node. Once this request is seen, the compute node will load the segment and begin servicing it.
Rules
-----
Segments are loaded and dropped from the cluster based on a set of rules. Rules indicate how segments should be assigned to different compute node tiers and how many replicants of a segment should exist in each tier. Rules may also indicate when segments should be dropped entirely from the cluster. The master loads a set of rules from the database. Rules may be specific to a certain datasource and/or a default set of rules can be configured. Rules are read in order and hence the ordering of rules is important. The master will cycle through all available segments and match each segment with the first rule that applies. Each segment may only match a single rule
For more information on rules, see [[Rule Configuration]].
Cleaning Up Segments
--------------------
Each run, the Druid master compares the list of available database segments in the database with the current segments in the cluster. Segments that are not in the database but are still being served in the cluster are flagged and appended to a removal list. Segments that are overshadowed (their versions are too old and their data has been replaced by newer segments) are also dropped.
Segment Availability
--------------------
If a compute node restarts or becomes unavailable for any reason, the Druid master will notice a node has gone missing and treat all segments served by that node as being dropped. Given a sufficient period of time, the segments may be reassigned to other compute nodes in the cluster. However, each segment that is dropped is not immediately forgotten. Instead, there is a transitional data structure that stores all dropped segments with an associated lifetime. The lifetime represents a period of time in which the master will not reassign a dropped segment. Hence, if a compute node becomes unavailable and available again within a short period of time, the compute node will start up and serve segments from its cache without any those segments being reassigned across the cluster.
Balancing Segment Load
----------------------
To ensure an even distribution of segments across compute nodes in the cluster, the master node will find the total size of all segments being served by every compute node each time the master runs. For every compute node tier in the cluster, the master node will determine the compute node with the highest utilization and the compute node with the lowest utilization. The percent difference in utilization between the two nodes is computed, and if the result exceeds a certain threshold, a number of segments will be moved from the highest utilized node to the lowest utilized node. There is a configurable limit on the number of segments that can be moved from one node to another each time the master runs. Segments to be moved are selected at random and only moved if the resulting utilization calculation indicates the percentage difference between the highest and lowest servers has decreased.
HTTP Endpoints
--------------
The master node exposes several HTTP endpoints for interactions.
### GET
/info/master - returns the current true master of the cluster as a JSON object. E.g. A GET request to <IP>:8080/info/master will yield JSON of the form {[host]("IP"})
/info/cluster - returns JSON data about every node and segment in the cluster. E.g. A GET request to <IP>:8080/info/cluster will yield JSON data organized by nodes. Information about each node and each segment on each node will be returned.
/info/servers (optional param ?full) - returns all segments in the cluster if the full flag is not set, otherwise returns full metadata about all servers in the cluster
/info/servers/{serverName} - returns full metadata about a specific server
/info/servers/{serverName}/segments (optional param ?full) - returns a list of all segments for a server if the full flag is not set, otherwise returns all segment metadata
/info/servers/{serverName}/segments/{segmentId} - returns full metadata for a specific segment
/info/segments (optional param ?full)- returns all segments in the cluster as a list if the full flag is not set, otherwise returns all metadata about segments in the cluster
/info/segments/{segmentId} - returns full metadata for a specific segment
/info/datasources (optional param ?full) - returns a list of datasources in the cluster if the full flag is not set, otherwise returns all the metadata for every datasource in the cluster
/info/datasources/{dataSourceName} - returns full metadata for a datasource
/info/datasources/{dataSourceName}/segments (optional param ?full) - returns a list of all segments for a datasource if the full flag is not set, otherwise returns full segment metadata for a datasource
/info/datasources/{dataSourceName}/segments/{segmentId} - returns full segment metadata for a specific segment
/info/rules - returns all rules for all data sources in the cluster including the default datasource.
/info/rules/{dataSourceName} - returns all rules for a specified datasource
### POST
/info/rules/{dataSourceName} - POST with a list of rules in JSON form to update rules.
The Master Console
------------------
The Druid master exposes a web GUI for displaying cluster information and rule configuration. After the master starts, the console can be accessed at http://HOST:PORT/static/. There exists a full cluster view, as well as views for individual compute nodes, datasources and segments themselves. Segment information can be displayed in raw JSON form or as part of a sortable and filterable table.
The master console also exposes an interface to creating and editing rules. All valid datasources configured in the segment database, along with a default datasource, are available for configuration. Rules of different types can be added, deleted or edited.
FAQ
---
1. **Do clients ever contact the master node?**
The master is not involved in the lifecycle of a query.
Compute nodes never directly contact the master node. The Druid master tells the compute nodes to load/drop data via Zookeeper, but the compute nodes are completely unaware of the master.
Brokers also never contact the master. Brokers base their understanding of the data topology on metadata exposed by the compute nodes via ZK and are completely unaware of the master.
2. **Does it matter if the master node starts up before or after other processes?**
No. If the Druid master is not started up, no new segments will be loaded in the cluster and outdated segments will not be dropped. However, the master node can be started up at any time, and after a configurable delay, will start running master tasks.
This also means that if you have a working cluster and all of your masters die, the cluster will continue to function, it just won’t experience any changes to its data topology.
Running
-------
Master nodes can be run using the `com.metamx.druid.http.MasterMain` class.
Configuration
-------------
See [[Configuration]].
MySQL is an external dependency of Druid. We use it to store various metadata about the system, but not to store the actual data. There are a number of tables used for various purposes described below.
Segments Table
--------------
This is dictated by the `druid.database.segmentTable` property (Note that these properties are going to change in the next stable version after 0.4.12).
This table stores metadata about the segments that are available in the system. The table is polled by the [[Master]] to determine the set of segments that should be available for querying in the system. The table has two main functional columns, the other columns are for indexing purposes.
The `used` column is a boolean “tombstone”. A 1 means that the segment should be “used” by the cluster (i.e. it should be loaded and available for requests). A 0 means that the segment should not be actively loaded into the cluster. We do this as a means of removing segments from the cluster without actually removing their metadata (which allows for simpler rolling back if that is ever an issue).
The `payload` column stores a JSON blob that has all of the metadata for the segment (some of the data stored in this payload is redundant with some of the columns in the table, that is intentional). This looks something like
{
"dataSource":"wikipedia",
"interval":"2012-05-23T00:00:00.000Z/2012-05-24T00:00:00.000Z",
"version":"2012-05-24T00:10:00.046Z",
"loadSpec":{"type":"s3_zip",
"bucket":"bucket_for_segment",
"key":"path/to/segment/on/s3"},
"dimensions":"comma-delimited-list-of-dimension-names",
"metrics":"comma-delimited-list-of-metric-names",
"shardSpec":{"type":"none"},
"binaryVersion":9,
"size":size_of_segment,
"identifier":"wikipedia_2012-05-23T00:00:00.000Z_2012-05-24T00:00:00.000Z_2012-05-23T00:10:00.046Z"
}
Note that the format of this blob can and will change from time-to-time.
Rule Table
----------
The rule table is used to store the various rules about where segments should land. These rules are used by the [[Master]] when making segment (re-)allocation decisions about the cluster.
Config Table
------------
The config table is used to store runtime configuration objects. We do not have many of these yet and we are not sure if we will keep this mechanism going forward, but it is the beginnings of a method of changing some configuration parameters across the cluster at runtime.
Task-related Tables
-------------------
There are also a number of tables created and used by the [[Indexing Service]] in the course of its work.
The orderBy field provides the functionality to sort and limit the set of results from a groupBy query. Available options are:
### DefaultLimitSpec
The default limit spec takes a limit and the list of columns to do an orderBy operation over. The grammar is:
<code>
{
"type" : "default",
"limit" : <integer_value>,
"columns" : [list of OrderByColumnSpec],
}
</code>
#### OrderByColumnSpec
OrderByColumnSpecs indicate how to do order by operations. Each order by condition can be a <code>String</code> or a map of the following form:
<code>
{
"dimension" : "<Any dimension or metric>",
"direction" : "ASCENDING OR DESCENDING"
}
</code>
The Plumber is the thing that handles generated segments both while they are being generated and when they are “done”. This is also technically a pluggable interface and there are multiple implementations, but there are a lot of details handled by the plumber such that it is expected that there will only be a few implementations and only more advanced third-parties will implement their own. See [here](https://github.com/metamx/druid/wiki/Plumber#available-plumbers) for a description of the plumbers included with Druid.
|Field|Type|Description|Required|
|-----|----|-----------|--------|
|type|String|Specifies the type of plumber. Each value will have its own configuration schema, plumbers packaged with Druid are described [here](https://github.com/metamx/druid/wiki/Plumber#available-plumbers)|yes|
We provide a brief description of the example to exemplify the types of things that are configured on the plumber.
- `windowPeriod` is the amount of lag time to allow events. This is configured with a 10 minute window, meaning that any event more than 10 minutes ago will be thrown away and not included in the segment generated by the realtime server.
- `basePersistDirectory` is the directory to put things that need persistence. The plumber is responsible for the actual intermediate persists and this tells it where to store those persists.
Available Plumbers
------------------
#### YeOldePlumber
This plumber creates single historical segments.
#### RealtimePlumber
This plumber creates real-time/mutable segments.
Post-aggregations are specifications of processing that should happen on aggregated values as they come out of Druid. If you include a post aggregation as part of a query, make sure to include all aggregators the post-aggregator requires.
There are several post-aggregators available.
### Arithmetic post-aggregator
The arithmetic post-aggregator applies the provided function to the given fields from left to right. The fields can be aggregators or other post aggregators.
Supported functions are `+`, `-`, `*`, and `/`
The grammar for an arithmetic post aggregation is:
<code>postAggregation : {
"type" : "arithmetic",
"name" : <output_name>,
"fn" : <arithmetic_function>,
"fields": [<post_aggregator>, <post_aggregator>, ...]
}</code>
### Field accessor post-aggregator
This returns the value produced by the specified [[aggregator|Aggregations]].
`fieldName` refers to the output name of the aggregator given in the [[aggregations|Aggregations]] portion of the query.
<code>field_accessor : {
"type" : "fieldAccess",
"fieldName" : <aggregator_name>
}</code>
### Constant post-aggregator
The constant post-aggregator always returns the specified value.
<code>constant : {
"type" : "constant",
"name" : <output_name>,
"value" : <numerical_value>,
}</code>
### Example Usage
In this example, let’s calculate a simple percentage using post aggregators. Let’s imagine our data set has a metric called “total”.
The format of the query JSON is as follows:
<code>
{
...
"aggregations" : [
{
"type" : "count",
"name" : "rows"
},
{
"type" : "doubleSum",
"name" : "tot",
"fieldName" : "total"
}
],
"postAggregations" : {
"type" : "arithmetic",
"name" : "average",
"fn" : "*",
"fields" : [
{
"type" : "arithmetic",
"name" : "div",
"fn" : "/",
"fields" : [
{
"type" : "fieldAccess",
"name" : "tot",
"fieldName" : "tot"
},
{
"type" : "fieldAccess",
"name" : "rows",
"fieldName" : "rows"
}
]
},
{
"type" : "constant",
"name": "const",
"value" : 100
}
]
}
...
}
</code>
# Setup #
Before we start querying druid, we're going to finish setting up a complete cluster on localhost. In [[Loading Your Data]] we setup a [[Realtime]], [[Compute]] and [[Master]] node. If you've already completed that tutorial, you need only follow the directions for 'Booting a Broker Node'.
## Booting a Broker Node ##
1. Setup a config file at config/broker/runtime.properties that looks like this:
```
druid.host=0.0.0.0:8083
druid.port=8083
com.metamx.emitter.logging=true
druid.processing.formatString=processing_%s
druid.processing.numThreads=1
druid.processing.buffer.sizeBytes=10000000
#emitting, opaque marker
druid.service=example
druid.request.logging.dir=/tmp/example/log
druid.realtime.specFile=realtime.spec
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
# below are dummy values when operating a realtime only node
druid.processing.numThreads=3
com.metamx.aws.accessKey=dummy_access_key
com.metamx.aws.secretKey=dummy_secret_key
druid.pusher.s3.bucket=dummy_s3_bucket
druid.zk.service.host=localhost
druid.server.maxSize=300000000000
druid.zk.paths.base=/druid
druid.database.segmentTable=prod_segments
druid.database.user=druid
druid.database.password=diurd
druid.database.connectURI=jdbc:mysql://localhost:3306/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
druid.database.ruleTable=rules
druid.database.configTable=config
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# Path on local FS for storage of segment metadata; dir will be created if needed
druid.paths.segmentInfoCache=/tmp/druid/segmentInfoCache
druid.pusher.local.storageDirectory=/tmp/druid/localStorage
druid.pusher.local=true
# thread pool size for servicing queries
druid.client.http.connections=30
```
2. Run the broker node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-Ddruid.realtime.specFile=realtime.spec \
-classpath services/target/druid-services-0.5.50-SNAPSHOT-selfcontained.jar:config/broker \
com.metamx.druid.http.BrokerMain
```
## Booting a Master Node ##
1. Setup a config file at config/master/runtime.properties that looks like this: [https://gist.github.com/rjurney/5818870](https://gist.github.com/rjurney/5818870)
2. Run the master node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-classpath services/target/druid-services-0.5.50-SNAPSHOT-selfcontained.jar:config/master \
com.metamx.druid.http.MasterMain
```
## Booting a Realtime Node ##
1. Setup a config file at config/realtime/runtime.properties that looks like this: [https://gist.github.com/rjurney/5818774](https://gist.github.com/rjurney/5818774)
2. Setup a realtime.spec file like this: [https://gist.github.com/rjurney/5818779](https://gist.github.com/rjurney/5818779)
3. Run the realtime node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-Ddruid.realtime.specFile=realtime.spec \
-classpath services/target/druid-services-0.5.50-SNAPSHOT-selfcontained.jar:config/realtime \
com.metamx.druid.realtime.RealtimeMain
```
## Booting a Compute Node ##
1. Setup a config file at config/compute/runtime.properties that looks like this: [https://gist.github.com/rjurney/5818885](https://gist.github.com/rjurney/5818885)
2. Run the compute node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-classpath services/target/druid-services-0.5.50-SNAPSHOT-selfcontained.jar:config/compute \
com.metamx.druid.http.ComputeMain
```
# Querying Your Data #
Now that we have a complete cluster setup on localhost, we need to load data. To do so, refer to [[Loading Your Data]]. Having done that, its time to query our data! For a complete specification of queries, see [[Querying]].
## Querying Different Nodes ##
As a shared-nothing system, there are three ways to query druid, against the [[Realtime]], [[Compute]] or [[Broker]] node. Querying a Realtime node returns only realtime data, querying a compute node returns only historical segments. Querying the broker will query both realtime and compute segments and compose an overall result for the query. This is the normal mode of operation for queries in druid.
### Construct a Query ###
For constructing this query, see: Querying against the realtime.spec
```json
{
"queryType": "groupBy",
"dataSource": "druidtest",
"granularity": "all",
"dimensions": [],
"aggregations": [
{"type": "count", "name": "rows"},
{"type": "longSum", "name": "imps", "fieldName": "impressions"},
{"type": "doubleSum", "name": "wp", "fieldName": "wp"}
],
"intervals": ["2010-01-01T00:00/2020-01-01T00"]
}
```
### Querying the Realtime Node ###
Run our query against port 8080:
```bash
curl -X POST "http://localhost:8080/druid/v2/?pretty" \
-H 'content-type: application/json' -d @query.body
```
See our result:
```json
[ {
"version" : "v1",
"timestamp" : "2010-01-01T00:00:00.000Z",
"event" : {
"imps" : 5,
"wp" : 15000.0,
"rows" : 5
}
} ]
```
### Querying the Compute Node ###
Run the query against port 8082:
```bash
curl -X POST "http://localhost:8082/druid/v2/?pretty" \
-H 'content-type: application/json' -d @query.body
```
And get (similar to):
```json
[ {
"version" : "v1",
"timestamp" : "2010-01-01T00:00:00.000Z",
"event" : {
"imps" : 27,
"wp" : 77000.0,
"rows" : 9
}
} ]
```
### Querying both Nodes via the Broker ###
Run the query against port 8083:
```bash
curl -X POST "http://localhost:8083/druid/v2/?pretty" \
-H 'content-type: application/json' -d @query.body
```
And get:
```json
[ {
"version" : "v1",
"timestamp" : "2010-01-01T00:00:00.000Z",
"event" : {
"imps" : 5,
"wp" : 15000.0,
"rows" : 5
}
} ]
```
Now that we know what nodes can be queried (although you should usually use the broker node), lets learn how to know what queries are available.
## Querying Against the realtime.spec ##
How are we to know what queries we can run? Although [[Querying]] is a helpful index, to get a handle on querying our data we need to look at our [[Realtime]] node's realtime.spec file:
```json
[{
"schema" : { "dataSource":"druidtest",
"aggregators":[ {"type":"count", "name":"impressions"},
{"type":"doubleSum","name":"wp","fieldName":"wp"}],
"indexGranularity":"minute",
"shardSpec" : { "type": "none" } },
"config" : { "maxRowsInMemory" : 500000,
"intermediatePersistPeriod" : "PT10m" },
"firehose" : { "type" : "kafka-0.7.2",
"consumerProps" : { "zk.connect" : "localhost:2181",
"zk.connectiontimeout.ms" : "15000",
"zk.sessiontimeout.ms" : "15000",
"zk.synctime.ms" : "5000",
"groupid" : "topic-pixel-local",
"fetch.size" : "1048586",
"autooffset.reset" : "largest",
"autocommit.enable" : "false" },
"feed" : "druidtest",
"parser" : { "timestampSpec" : { "column" : "utcdt", "format" : "iso" },
"data" : { "format" : "json" },
"dimensionExclusions" : ["wp"] } },
"plumber" : { "type" : "realtime",
"windowPeriod" : "PT10m",
"segmentGranularity":"hour",
"basePersistDirectory" : "/tmp/realtime/basePersist",
"rejectionPolicy": {"type": "messageTime"} }
}]
```
### dataSource ###
```json
"dataSource":"druidtest"
```
Our dataSource tells us the name of the relation/table, or 'source of data', to query in both our realtime.spec and query.body!
### aggregations ###
Note the [[Aggregations]] in our query:
```json
"aggregations": [
{"type": "count", "name": "rows"},
{"type": "longSum", "name": "imps", "fieldName": "impressions"},
{"type": "doubleSum", "name": "wp", "fieldName": "wp"}
],
```
this matches up to the aggregators in the schema of our realtime.spec!
```json
"aggregators":[ {"type":"count", "name":"impressions"},
{"type":"doubleSum","name":"wp","fieldName":"wp"}],
```
### dimensions ###
Lets look back at our actual records (from [[Loading Your Data]]):
```json
{"utcdt": "2010-01-01T01:01:01", "wp": 1000, "gender": "male", "age": 100}
{"utcdt": "2010-01-01T01:01:02", "wp": 2000, "gender": "female", "age": 50}
{"utcdt": "2010-01-01T01:01:03", "wp": 3000, "gender": "male", "age": 20}
{"utcdt": "2010-01-01T01:01:04", "wp": 4000, "gender": "female", "age": 30}
{"utcdt": "2010-01-01T01:01:05", "wp": 5000, "gender": "male", "age": 40}
```
Note that we have two dimensions to our data, other than our primary metric, wp. They are 'gender' and 'age'. We can specify these in our query! Note that we have added a dimension: age, below.
```json
{
"queryType": "groupBy",
"dataSource": "druidtest",
"granularity": "all",
"dimensions": ["age"],
"aggregations": [
{"type": "count", "name": "rows"},
{"type": "longSum", "name": "imps", "fieldName": "impressions"},
{"type": "doubleSum", "name": "wp", "fieldName": "wp"}
],
"intervals": ["2010-01-01T00:00/2020-01-01T00"]
}
```
Which gets us grouped data in return!
```json
[ {
"version" : "v1",
"timestamp" : "2010-01-01T00:00:00.000Z",
"event" : {
"imps" : 1,
"age" : "100",
"wp" : 1000.0,
"rows" : 1
}
}, {
"version" : "v1",
"timestamp" : "2010-01-01T00:00:00.000Z",
"event" : {
"imps" : 1,
"age" : "20",
"wp" : 3000.0,
"rows" : 1
}
}, {
"version" : "v1",
"timestamp" : "2010-01-01T00:00:00.000Z",
"event" : {
"imps" : 1,
"age" : "30",
"wp" : 4000.0,
"rows" : 1
}
}, {
"version" : "v1",
"timestamp" : "2010-01-01T00:00:00.000Z",
"event" : {
"imps" : 1,
"age" : "40",
"wp" : 5000.0,
"rows" : 1
}
}, {
"version" : "v1",
"timestamp" : "2010-01-01T00:00:00.000Z",
"event" : {
"imps" : 1,
"age" : "50",
"wp" : 2000.0,
"rows" : 1
}
} ]
```
### filtering ###
Now that we've observed our dimensions, we can also filter:
```json
{
"queryType": "groupBy",
"dataSource": "druidtest",
"granularity": "all",
"filter": {
"type": "selector",
"dimension": "gender",
"value": "male"
},
"aggregations": [
{"type": "count", "name": "rows"},
{"type": "longSum", "name": "imps", "fieldName": "impressions"},
{"type": "doubleSum", "name": "wp", "fieldName": "wp"}
],
"intervals": ["2010-01-01T00:00/2020-01-01T00"]
}
```
Which gets us just people aged 40:
```json
[ {
"version" : "v1",
"timestamp" : "2010-01-01T00:00:00.000Z",
"event" : {
"imps" : 3,
"wp" : 9000.0,
"rows" : 3
}
} ]
```
Check out [[Filters]] for more.
## Learn More ##
You can learn more about querying at [[Querying]]! Now check out [[Booting a production cluster]]!
\ No newline at end of file
Querying
========
Queries are made using an HTTP REST style request to a [[Broker]], [[Compute]], or [[Realtime]] node. The query is expressed in JSON and each of these node types expose the same REST query interface.
We start by describing an example query with additional comments that mention possible variations. Query operators are also summarized in a table below.
Example Query “rand”
--------------------
Here is the query in the examples/rand subproject (file is query.body), followed by a commented version of the same.
\`\`\`javascript
{
[queryType]() “groupBy”,
[dataSource]() “randSeq”,
[granularity]() “all”,
[dimensions]() [],
[aggregations]() [
{ [type]() “count”, [name]() “rows” },
{ [type]() “doubleSum”, [fieldName]() “events”, [name]() “e” },
{ [type]() “doubleSum”, [fieldName]() “outColumn”, [name]() “randomNumberSum” }
],
[postAggregations]() [{
[type]() “arithmetic”,
[name]() “avg\_random”,
[fn]() “/”,
[fields]() [
{ [type]() “fieldAccess”, [fieldName]() “randomNumberSum” },
{ [type]() “fieldAccess”, [fieldName]() “rows” }
]
}],
[intervals]() [“2012-10-01T00:00/2020-01-01T00”]
}
\`\`\`
This query could be submitted via curl like so (assuming the query object is in a file “query.json”).
curl -X POST "http://host:port/druid/v2/?pretty" -H 'content-type: application/json' -d @query.json
The “pretty” query parameter gets the results formatted a bit nicer.
Details of Example Query “rand”
-------------------------------
The queryType JSON field identifies which kind of query operator is to be used, in this case it is groupBy, the most frequently used kind (which corresponds to an internal implementation class GroupByQuery registered as “groupBy”), and it has a set of required fields that are also part of this query. The queryType can also be “search” or “timeBoundary” which have similar or different required fields summarized below:
\`\`\`javascript
{
[queryType]() “groupBy”,
\`\`\`
The dataSource JSON field shown next identifies where to apply the query. In this case, randSeq corresponds to the examples/rand/rand\_realtime.spec file schema:
\`\`\`javascript
[dataSource]() “randSeq”,
\`\`\`
The granularity JSON field specifies the bucket size for values. It could be a built-in time interval like “second”, “minute”, “fifteen\_minute”, “thirty\_minute”, “hour” or “day”. It can also be an expression like `{"type": "period", "period":"PT6m"}` meaning “6 minute buckets”. See [[Granularities]] for more information on the different options for this field. In this example, it is set to the special value “all” which means [bucket all data points together into the same time bucket]()
\`\`\`javascript
[granularity]() “all”,
\`\`\`
The dimensions JSON field value is an array of zero or more fields as defined in the dataSource spec file or defined in the input records and carried forward. These are used to constrain the grouping. If empty, then one value per time granularity bucket is requested in the groupBy:
\`\`\`javascript
[dimensions]() [],
\`\`\`
A groupBy also requires the JSON field “aggregations” (See [[Aggregations]]), which are applied to the column specified by fieldName and the output of the aggregation will be named according to the value in the “name” field:
\`\`\`javascript
[aggregations]() [
{ [type]() “count”, [name]() “rows” },
{ [type]() “doubleSum”, [fieldName]() “events”, [name]() “e” },
{ [type]() “doubleSum”, [fieldName]() “outColumn”, [name]() “randomNumberSum” }
],
\`\`\`
You can also specify postAggregations, which are applied after data has been aggregated for the current granularity and dimensions bucket. See [[Post Aggregations]] for a detailed description. In the rand example, an arithmetic type operation (division, as specified by “fn”) is performed with the result “name” of “avg\_random”. The “fields” field specifies the inputs from the aggregation stage to this expression. Note that identifiers corresponding to “name” JSON field inside the type “fieldAccess” are required but not used outside this expression, so they are prefixed with “dummy” for clarity:
\`\`\`javascript
[postAggregations]() [{
[type]() “arithmetic”,
[name]() “avg\_random”,
[fn]() “/”,
[fields]() [
{ [type]() “fieldAccess”, [fieldName]() “randomNumberSum” },
{ [type]() “fieldAccess”, [fieldName]() “rows” }
]
}],
\`\`\`
The time range(s) of the query; data outside the specified intervals will not be used; this example specifies from October 1, 2012 until January 1, 2020:
\`\`\`javascript
[intervals]() [“2012-10-01T00:00/2020-01-01T00”]
}
\`\`\`
Query Operators
---------------
The following table summarizes query properties.
|query types|property|description|required?|
|-----------|--------|-----------|---------|
|timeseries, groupBy, search, timeBoundary|dataSource|query is applied to this data source|yes|
|timeseries, groupBy, search|intervals|range of time series to include in query|yes|
|timeseries, groupBy, search, timeBoundary|context|This is a key-value map that can allow the query to alter some of the behavior of a query. It is primarily used for debugging, for example if you include `"bySegment":true` in the map, you will get results associated with the data segment they came from.|no|
|timeseries, groupBy, search|filter|Specifies the filter (the “WHERE” clause in SQL) for the query. See [[Filters]]|no|
|timeseries, groupBy, search|granularity|the timestamp granularity to bucket results into (i.e. “hour”). See [[Granularities]] for more information.|no|
|groupBy|dimensions|constrains the groupings; if empty, then one value per time granularity bucket|yes|
|timeseries, groupBy|aggregations|aggregations that combine values in a bucket. See [[Aggregations]].|yes|
|timeseries, groupBy|postAggregations|aggregations of aggregations. See [[Post Aggregations]].|yes|
|search|limit|maximum number of results (default is 1000), a system-level maximum can also be set via `com.metamx.query.search.maxSearchLimit`|no|
|search|searchDimensions|Dimensions to apply the search query to. If not specified, it will search through all dimensions.|no|
|search|query|The query portion of the search query. This is essentially a predicate that specifies if something matches.|yes|
Additional Information about Query Types
----------------------------------------
[[TimeseriesQuery]]
Realtime
========
Realtime nodes provide a realtime index. Data indexed via these nodes is immediately available for querying. Realtime nodes will periodically build segments representing the data they’ve collected over some span of time and hand these segments off to [[Compute]] nodes.
Running
-------
Realtime nodes can be run using the `com.metamx.druid.realtime.RealtimeMain` class.
Segment Propagation
-------------------
The segment propagation diagram for real-time data ingestion can be seen below:
![Segment Propagation](https://raw.github.com/metamx/druid/druid-0.5.4/doc/segment_propagation.png "Segment Propagation")
Configuration
-------------
Realtime nodes take a mix of base server configuration and spec files that describe how to connect, process and expose the realtime feed. See [[Configuration]] for information about general server configuration.
### Realtime “specFile”
The property `druid.realtime.specFile` has the path of a file (absolute or relative path and file name) with realtime specifications in it. This “specFile” should be a JSON Array of JSON objects like the following:
<code>
[{
"schema" : { "dataSource":"dataSourceName",
"aggregators":[ {"type":"count", "name":"events"},
{"type":"doubleSum","name":"outColumn","fieldName":"inColumn"} ],
"indexGranularity":"minute",
"shardSpec" : { "type": "none" } },
"config" : { "maxRowsInMemory" : 500000,
"intermediatePersistPeriod" : "PT10m" },
"firehose" : { "type" : "kafka-0.7.2",
"consumerProps" : { "zk.connect" : "zk_connect_string",
"zk.connectiontimeout.ms" : "15000",
"zk.sessiontimeout.ms" : "15000",
"zk.synctime.ms" : "5000",
"groupid" : "consumer-group",
"fetch.size" : "1048586",
"autooffset.reset" : "largest",
"autocommit.enable" : "false" },
"feed" : "your_kafka_topic",
"parser" : { "timestampSpec" : { "column" : "timestamp", "format" : "iso" },
"data" : { "format" : "json" },
"dimensionExclusions" : ["value"] } },
"plumber" : { "type" : "realtime",
"windowPeriod" : "PT10m",
"segmentGranularity":"hour",
"basePersistDirectory" : "/tmp/realtime/basePersist" }
}]
</code>
This is a JSON Array so you can give more than one realtime stream to a given node. The number you can put in the same process depends on the exact configuration. In general, it is best to think of each realtime stream handler as requiring 2-threads: 1 thread for data consumption and aggregation, 1 thread for incremental persists and other background tasks.
There are four parts to a realtime stream specification, `schema`, `config`, `firehose` and `plumber` which we will go into here.
#### Schema
This describes the data schema for the output Druid segment. More information about concepts in Druid and querying can be found at [[Concepts-and-Terminology]] and [[Querying]].
|Field|Type|Description|Required|
|-----|----|-----------|--------|
|aggregators|Array of Objects|The list of aggregators to use to aggregate colliding rows together.|yes|
|dataSource|String|The name of the dataSource that the segment belongs to.|yes|
|indexGranularity|String|The granularity of the data inside the segment. E.g. a value of “minute” will mean that data is aggregated at minutely granularity. That is, if there are collisions in the tuple (minute(timestamp), dimensions), then it will aggregate values together using the aggregators instead of storing individual rows.|yes|
|segmentGranularity|String|The granularity of the segment as a whole. This is generally larger than the index granularity and describes the rate at which the realtime server will push segments out for historical servers to take over.|yes|
|shardSpec|Object|This describes the shard that is represented by this server. This must be specified properly in order to have multiple realtime nodes indexing the same data stream in a sharded fashion.|no|
### Config
This provides configuration for the data processing portion of the realtime stream processor.
|Field|Type|Description|Required|
|-----|----|-----------|--------|
|intermediatePersistPeriod|ISO8601 Period String|The period that determines the rate at which intermediate persists occur. These persists determine how often commits happen against the incoming realtime stream. If the realtime data loading process is interrupted at time T, it should be restarted to re-read data that arrived at T minus this period.|yes|
|maxRowsInMemory|Number|The number of rows to aggregate before persisting. This number is the post-aggregation rows, so it is not equivalent to the number of input events, but the number of aggregated rows that those events result in. This is used to manage the required JVM heap size.|yes|
### Firehose
See [[Firehose]].
### Plumber
See [[Plumber]]
Constraints
-----------
The following tables summarizes constraints between settings in the spec file for the Realtime subsystem.
|*. Name |*. Effect |*. Minimum |*. Recommended |
| windowPeriod| when reading an InputRow, events with timestamp older than now minus this window are discarded | time jitter tolerance | use this to reject outliers |
| segmentGranularity| time granularity (minute, hour, day, week, month) for loading data at query time | equal to indexGranularity| more than indexGranularity|
| indexGranularity| time granularity (minute, hour, day, week, month) of indexes | less than segmentGranularity| minute, hour, day, week, month |
| intermediatePersistPeriod| the max real time (ISO8601 Period) between flushes of InputRows from memory to disk | avoid excessive flushing | number of un-persisted rows in memory also constrained by maxRowsInMemory |
| maxRowsInMemory| the max number of InputRows to hold in memory before a flush to disk | number of un-persisted post-aggregation rows in memory is also constrained by intermediatePersistPeriod | use this to avoid running out of heap if too many rows in an intermediatePersistPeriod |
The normal, expected use cases have the following overall constraints: `indexGranularity < intermediatePersistPeriod =< windowPeriod < segmentGranularity`
If the RealtimeNode process runs out of heap, try adjusting druid.computation.buffer.size property which specifies a size in bytes that must fit into the heap.
Requirements
------------
Realtime nodes currently require a Kafka cluster to sit in front of them and collect results. There’s more configuration required for these as well.
Extending the code
------------------
Realtime integration is intended to be extended in two ways:
1. Connect to data streams from varied systems ([Firehose](https://github.com/metamx/druid/blob/master/realtime/src/main/java/com/metamx/druid/realtime/FirehoseFactory.java))
2. Adjust the publishing strategy to match your needs ([Plumber](https://github.com/metamx/druid/blob/master/realtime/src/main/java/com/metamx/druid/realtime/PlumberSchool.java))
The expectations are that the former will be very common and something that users of Druid will do on a fairly regular basis. Most users will probably never have to deal with the latter form of customization. Indeed, we hope that all potential use cases can be packaged up as part of Druid proper without requiring proprietary customization.
Given those expectations, adding a firehose is straightforward and completely encapsulated inside of the interface. Adding a plumber is more involved and requires understanding of how the system works to get right, it’s not impossible, but it’s not intended that individuals new to Druid will be able to do it immediately.
We will do our best to accept contributions from the community of new Firehoses and Plumbers, but we also understand the requirement for being able to plug in your own proprietary implementations. The model for doing this is by embedding the druid code in another project and writing your own `main()` method that initializes a RealtimeNode object and registers your proprietary objects with it.
<code>
public class MyRealtimeMain
{
private static final Logger log = new Logger(MyRealtimeMain.class);
public static void main(String[] args) throws Exception
{
LogLevelAdjuster.register();
Lifecycle lifecycle = new Lifecycle();
lifecycle.addManagedInstance(
RealtimeNode.builder()
.build()
.registerJacksonSubtype(foo.bar.MyFirehose.class)
);
try {
lifecycle.start();
}
catch (Throwable t) {
log.info(t, "Throwable caught at startup, committing seppuku");
System.exit(2);
}
lifecycle.join();
}
}
</code>
Pluggable pieces of the system are either handled by a setter on the RealtimeNode object, or they are configuration driven and need to be setup to allow for [Jackson polymorphic deserialization](http://wiki.fasterxml.com/JacksonPolymorphicDeserialization) and registered via the relevant methods on the RealtimeNode object.
Note: It is recommended that the master console is used to configure rules. However, the master node does have HTTP endpoints to programmatically configure rules.
Load Rules
----------
Load rules indicate how many replicants of a segment should exist in a server tier.
### Interval Load Rule
Interval load rules are of the form:
<code>
{
"type" : "loadByInterval",
"interval" : "2012-01-01/2013-01-01",
"tier" : "hot"
}
</code>
type - this should always be “loadByInterval”
interval - A JSON Object representing ISO-8601 Intervals
tier - the configured compute node tier
### Period Load Rule
Period load rules are of the form:
<code>
{
"type" : "loadByInterval",
"period" : "P1M",
"tier" : "hot"
}
</code>
type - this should always be “loadByPeriod”
period - A JSON Object representing ISO-8601 Periods
tier - the configured compute node tier
The interval of a segment will be compared against the specified period. The rule matches if the period overlaps the interval.
Drop Rules
----------
Drop rules indicate when segments should be dropped from the cluster.
### Interval Drop Rule
Interval drop rules are of the form:
<code>
{
"type" : "dropByInterval",
"interval" : "2012-01-01/2013-01-01"
}
</code>
type - this should always be “dropByInterval”
interval - A JSON Object representing ISO-8601 Periods
A segment is dropped if the interval contains the interval of the segment.
### Period Drop Rule
Period drop rules are of the form:
<code>
{
"type" : "dropByPeriod",
"period" : "P1M"
}
</code>
type - this should always be “dropByPeriod”
period - A JSON Object representing ISO-8601 Periods
The interval of a segment will be compared against the specified period. The period is from some time in the past to the current time. The rule matches if the period contains the interval.
A search query returns dimension values that match the search specification.
<code>{
"queryType": "search",
"dataSource": "sample_datasource",
"granularity": "day",
"searchDimensions": [
"dim1",
"dim2"
],
"query": {
"type": "insensitive_contains",
"value": "Ke"
},
"sort" : {
"type": "lexicographic"
},
"intervals": [
"2013-01-01T00:00:00.000/2013-01-03T00:00:00.000"
]
}
</code>
There are several main parts to a search query:
|property|description|required?|
|--------|-----------|---------|
|queryType|This String should always be “search”; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String defining the data source to query, very similar to a table in a relational database|yes|
|granularity|Defines the granularity of the query. See [[Granularities]]|yes|
|filter|See [[Filters]]|no|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|searchDimensions|The dimensions to run the search over. Excluding this means the search is run over all dimensions.|no|
|query|See [[SearchQuerySpec]].|yes|
|sort|How the results of the search should sorted. Two possible types here are “lexicographic” and “strlen”.|yes|
|context|An additional JSON Object which can be used to specify certain flags.|no|
The format of the result is:
<code>[
{
"timestamp": "2012-01-01T00:00:00.000Z",
"result": [
{
"dimension": "dim1",
"value": "Ke$ha"
},
{
"dimension": "dim2",
"value": "Ke$haForPresident"
}
]
},
{
"timestamp": "2012-01-02T00:00:00.000Z",
"result": [
{
"dimension": "dim1",
"value": "SomethingThatContainsKe"
},
{
"dimension": "dim2",
"value": "SomethingElseThatContainsKe"
}
]
}
]
</code>
Search query specs define how a “match” is defined between a search value and a dimension value. The available search query specs are:
InsensitiveContainsSearchQuerySpec
----------------------------------
If any part of a dimension value contains the value specified in this search query spec, regardless of case, a “match” occurs. The grammar is:
<code>{
"type" : "insensitive_contains",
"value" : "some_value"
}
</code>
FragmentSearchQuerySpec
-----------------------
If any part of a dimension value contains any of the values specified in this search query spec, regardless of case, a “match” occurs. The grammar is:
<code>{
"type" : "fragment",
"values" : ["fragment1", "fragment2"]
}
</code>
Segment metadata queries return per segment information about:
\* Cardinality of all columns in the segment
\* Estimated byte size for the segment columns in TSV format
\* Interval the segment covers
\* Column type of all the columns in the segment
\* Estimated total segment byte size in TSV format
\* Segment id
<code>{
"queryType":"segmentMetadata",
"dataSource":"sample_datasource",
"intervals":["2013-01-01/2014-01-01"],
}
</code>
There are several main parts to a segment metadata query:
|property|description|required?|
|--------|-----------|---------|
|queryType|This String should always be “segmentMetadata”; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String defining the data source to query, very similar to a table in a relational database|yes|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|merge|Merge all individual segment metadata results into a single result|no|
|context|An additional JSON Object which can be used to specify certain flags.|no|
The format of the result is:
<code>[ {
"id" : "some_id",
"intervals" : [ "2013-05-13T00:00:00.000Z/2013-05-14T00:00:00.000Z" ],
"columns" : {
"__time" : {
"type" : "LONG",
"size" : 407240380,
"cardinality" : null
},
"dim1" : {
"type" : "STRING",
"size" : 100000,
"cardinality" : 1944
},
"dim2" : {
"type" : "STRING",
"size" : 100000,
"cardinality" : 1504
},
"metric1" : {
"type" : "FLOAT",
"size" : 100000,
"cardinality" : null
}
},
"size" : 300000
} ]
</code>
Segments
========
Segments are the fundamental structure to store data in Druid. [[Compute]] and [[Realtime]] nodes load and serve segments for querying. To construct segments, Druid will always shard data by a time partition. Data may be further sharded based on dimension cardinality and row count.
The latest Druid segment version is `v9`.
Naming Convention
-----------------
Identifiers for segments are typically constructed using the segment datasource, interval start time (in ISO 8601 format), interval end time (in ISO 8601 format), and a version. If data is additionally sharded beyond a time range, the segment identifier will also contain a partition number.
An example segment identifier may be:
datasource\_intervalStart\_intervalEnd\_version\_partitionNum
Segment Components
------------------
A segment is compromised of several files, listed below.
### `version.bin`
4 bytes representing the current segment version as an integer. E.g., for v9 segments, the version is 0x0, 0x0, 0x0, 0x9
### `meta.smoosh`
A file with metadata (filenames and offsets) about the contents of the other `smoosh` files
### `XXXXX.smoosh`
There are some number of these files, which are concatenated binary data
The `smoosh` files represent multiple files “smooshed” together in order to minimize the number of file descriptors that must be open to house the data. They are files of up to 2GB in size (to match the limit of a memory mapped ByteBuffer in Java). The `smoosh` files house individual files for each of the columns in the data as well as an `index.drd` file with extra metadata about the segment.
There is also a special column called `__time` that refers to the time column of the segment. This will hopefully become less and less special as the code evolves, but for now it’s as special as my Mommy always told me I am.
### `index.drd`
The `index.drd` file houses 3 pieces of data in order
1. The names of all of the columns of the data
2. The names of the “dimensions” of the data (these are the dictionary-encoded, string columns. This is here to support some legacy APIs and will be superfluous in the future)
3. The data interval represented by this segment stored as the start and end timestamps as longs
Format of a column
------------------
Each column is stored as two parts:
1. A Jackson-serialized ColumnDescriptor
2. The rest of the binary for the column
A ColumnDescriptor is essentially an object that allows us to use jackson’s polymorphic deserialization to add new and interesting methods of serialization with minimal impact to the code. It consists of some metadata about the column (what type is it, is it multi-valued, etc.) and then a list of serde logic that can deserialize the rest of the binary.
Sharding Data to Create Segments
--------------------------------
### Sharding Data by Dimension
If the cumulative total number of rows for the different values of a given column exceed some configurable threshold, multiple segments representing the same time interval for the same datasource may be created. These segments will contain some partition number as part of their identifier. Sharding by dimension reduces some of the the costs associated with operations over high cardinality dimensions.
Note: This feature is highly experimental and only works with spatially indexed dimensions.
The grammar for a spatial filter is as follows:
<code>
{
"dimension": "spatialDim",
"bound": {
"type": "rectangular",
"minCoords": [10.0, 20.0],
"maxCoords": [30.0, 40.0]
}
}
</code>
Bounds
------
### Rectangular
|property|description|required?|
|--------|-----------|---------|
|minCoords|List of minimum dimension coordinates for coordinates [x, y, z, …]|yes|
|maxCoords|List of maximum dimension coordinates for coordinates [x, y, z, …]|yes|
### Radius
|property|description|required?|
|--------|-----------|---------|
|coords|Origin coordinates in the form [x, y, z, …]|yes|
|radius|The float radius value|yes|
Note: This feature is highly experimental.
In any of the data specs, there is now the option of providing spatial dimensions. For example, for a JSON data spec, spatial dimensions can be specified as follows:
<code>
{
"type": "JSON",
"dimensions": <some_dims>,
"spatialDimensions": [
{
"dimName": "coordinates",
"dims": ["lat", "long"]
},
...
]
}
</code>
|property|description|required?|
|--------|-----------|---------|
|dimName|The name of the spatial dimension. A spatial dimension may be constructed from multiple other dimensions or it may already exist as part of an event. If a spatial dimension already exists, it must be an array of dimension values.|yes|
|dims|A list of dimension names that comprise a spatial dimension.|no|
This page describes how to use Riak-CS for deep storage instead of S3. We are still setting up some of the peripheral stuff (file downloads, etc.).
This guide provided by Pablo Nebrera, thanks!
## The VMWare instance
A VMWare [image](http://static.druid.io/artifacts/vmware/druid_riak.tgz) based on Druid 0.3.27.2 and built according to the instructions below has also been provided by Pablo Nebrera.
The provided vmware machine has access with the following credentials:
username: root
password: riakdruid
## The Setup
We started with a minimal CentOS installation but you can use any other compatible installation. At the end of this setup you will one node that is running:
1. A Kafka Broker
1. A single-node Zookeeper ensemble
1. A single-node Riak-CS cluster
1. A Druid [[Master]]
1. A Druid [[Broker]]
1. A Druid [[Compute]]
1. A Druid [[Realtime]]
This just walks through getting the relevant software installed and running. You will then need to configure the [[Realtime]] node to take in your data.
### Configure System
1. Install `CentOS-6.4-x86_64-minimal.iso` ("RedHat v6.4" is the name of the AWS AMI) or your favorite Linux OS (if you use a different OS, some of the installation instructions for peripheral services might differ, please adjust them according to the system you are using). The rest of these instructions assume that you have a running instance and are running as the root user.
1. Configure the network. We used dhcp executing:
dhclient eth0
1. Disable firewall for now
service iptables stop
chkconfig iptables off
1. Change the limits on the number of open files a process can have:
cat >> /etc/security/limits.conf <<- _RBEOF_
# ulimit settings for Riak CS
root soft nofile 65536
root hard nofile 65536
riak soft nofile 65536
riak hard nofile 65536
_RBEOF_
ulimit -n 65536
### Install base software packages
1. Install necessary software with yum
yum install -y java-1.7.0-openjdk-devel git wget mysql-server
1. Install maven
wget http://apache.rediris.es/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
tar xzf apache-maven-3.0.5-bin.tar.gz -C /usr/local
pushd /usr/local
sudo ln -s apache-maven-3.0.5 maven
popd
echo 'export M2_HOME=/usr/local/maven' >> /etc/profile.d/maven.sh
echo 'export PATH=${M2_HOME}/bin:${PATH}' >> /etc/profile.d/maven.sh
source /etc/profile.d/maven.sh
1. Install erlang
wget http://binaries.erlang-solutions.com/rpm/centos/6/x86_64/esl-erlang-R15B01-1.x86_64.rpm
yum localinstall -y esl-erlang-R15B01-1.x86_64.rpm
### Install Kafka And Zookeeper
1. Install kafka and zookeeper:
wget http://apache.rediris.es/incubator/kafka/kafka-0.7.2-incubating/kafka-0.7.2-incubating-src.tgz
tar zxvf kafka-0.7.2-incubating-src.tgz
pushd kafka-0.7.2-incubating-src/
./sbt update
./sbt package
mkdir -p /var/lib/kafka
rsync -a * /var/lib/kafka/
popd
### Install Riak-CS
1. Install s3cmd to manage riak s3
wget http://downloads.sourceforge.net/project/s3tools/s3cmd/1.5.0-alpha3/s3cmd-1.5.0-alpha3.tar.gz
tar xzvf s3cmd-1.5.0-alpha3.tar.gz
cd s3cmd-1.5.0-alpha3
cp -r s3cmd S3 /usr/local/bin/
1. Install riak, riak-cs and stanchion. Note: riak-cs-control is optional
wget http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.1/rhel/6/riak-1.3.1-1.el6.x86_64.rpm
wget http://s3.amazonaws.com/downloads.basho.com/riak-cs/1.3/1.3.1/rhel/6/riak-cs-1.3.1-1.el6.x86_64.rpm
wget http://s3.amazonaws.com/downloads.basho.com/stanchion/1.3/1.3.1/rhel/6/stanchion-1.3.1-1.el6.x86_64.rpm
wget http://s3.amazonaws.com/downloads.basho.com/riak-cs-control/1.0/1.0.0/rhel/6/riak-cs-control-1.0.0-1.el6.x86_64.rpm
yum localinstall -y riak-*.rpm stanchion-*.rpm
### Install Druid
1. Clone the git repository for druid, checkout a "stable" tag and build
git clone https://github.com/metamx/druid.git druid
pushd druid
git checkout druid-0.4.12
export LANGUAGE=C
export LC_MESSAGE=C
export LC_ALL=C
export LANG=en_US
./build.sh
mkdir -p /var/lib/druid/app
cp ./services/target/druid-services-*-selfcontained.jar /var/lib/druid/app
ln -s /var/lib/druid/app/druid-services-*-selfcontained.jar /var/lib/druid/app/druid-services.jar
popd
### Configure stuff
1. Add this line to /etc/hosts
echo "127.0.0.1 s3.amazonaws.com bucket.s3.amazonaws.com `hostname`" >> /etc/hosts
NOTE: the bucket name in this case is "bucket", but you might need to update it to your bucket name if you want to use a different bucket name.
1. Download and extract run scripts and configuration files:
wget http://static.druid.io/artifacts/scripts/druid_scripts_nebrera.tar /
pushd /
tar xvf ~/druid_scripts_nebrera.tar
popd
1. Start Riak in order to create a user:
/etc/init.d/riak start
/etc/init.d/riak-cs start
/etc/init.d/stanchion start
You can check riak status using:
riak-admin member-status
You should expect results like
Attempting to restart script through sudo -H -u riak
================================= Membership ==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 100.0% -- 'riak@127.0.0.1'
-------------------------------------------------------------------------------
Valid:1 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
1. Create riak-cs user and yoink out credentials.
curl -H 'Content-Type: application/json' -X POST http://127.0.0.1:8088/riak-cs/user --data '{"email":"example@domain.com", "name":"admin"}' >> /tmp/riak_user.json
export RIAK_KEY_ID=`sed 's/^.*"key_id":"//' /tmp/riak_user.json | cut -d '"' -f 1`
export RIAK_KEY_SECRET=`sed 's/^.*"key_secret":"//' /tmp/riak_user.json | cut -d '"' -f 1`
sed -i "s/<%=[ ]*@key_id[ ]*%>/${RIAK_KEY_ID}/" /etc/riak-cs/app.config /etc/riak-cs-control/app.config /etc/stanchion/app.config /etc/druid/config.sh /etc/druid/base.properties /root/.s3cfg
sed -i "s/<%=[ ]*@key_secret[ ]*%>/${RIAK_KEY_SECRET}/" /etc/riak-cs/app.config /etc/riak-cs-control/app.config /etc/stanchion/app.config /etc/druid/config.sh /etc/druid/base.properties /root/.s3cfg
This will store the result of creating the user into `/tmp/riak_user.json`. You can look at it if you are interested. It will look something like this
{"email":"example@domain.com",
"display_name":"example",
"name":"admin",
"key_id":"DOXKZYR_QM2S-7HSKAEU",
"key_secret":"GtvVJow068RM-_viHIYR9DWMAXsFcL1SmjuNfA==",
"id":"4c5b5468c180f3efafd531b6cd8e2bb24371d99640aad5ced5fbbc0604fc473d",
"status":"enabled"}
1. Stop riak-cs:
/etc/init.d/riak-cs stop
/etc/init.d/stanchion stop
/etc/init.d/riak stop
1. Disable anonymous user creation
sed 's/{[ ]*anonymous_user_creation[ ]*,[ ]*true[ ]*}/{anonymous_user_creation, false}/' /etc/riak-cs/app.config |grep anonymous_user_creation
1. Restart riak-cs services:
/etc/init.d/riak start
/etc/init.d/riak-cs start
/etc/init.d/stanchion start
1. Create your bucket. The example name and in config files is "bucket"
s3cmd mb s3://bucket
You can verify that the bucket is created with:
s3cmd ls
1. Start MySQL server
service mysqld start
chkconfig mysqld on
/usr/bin/mysqladmin -u root password 'riakdruid'
NOTE: If you don't like "riakdruid" as your password, feel free to change it around.
NOTE: If you have used root user to connect to database. It should be changed by other user but I have used this one to simplify it
1. Start zookeeper and kafka
/etc/init.d/zookeeper start
/etc/init.d/kafka start
1. Start druid
/etc/init.d/druid_master start
/etc/init.d/druid_realtime start
/etc/init.d/druid_broker start
/etc/init.d/druid_compute start
\ No newline at end of file
Numerous backend engineers at [Metamarkets](http://www.metamarkets.com) work on Druid full-time. If you any questions about usage or code, feel free to contact any of us.
Google Groups Mailing List
--------------------------
The best place for questions is through our mailing list:
[druid-development@googlegroups.com](mailto:druid-development@googlegroups.com)
[https://groups.google.com/d/forum/druid-development](https://groups.google.com/d/forum/druid-development)
IRC
---
Several of us also hang out in the channel \#druid-dev on irc.freenode.net.
Tasks are run on workers and always operate on a single datasource. Once an indexer coordinator node accepts a task, a lock is created for the datasource and interval specified in the task. Tasks do not need to explicitly release locks, they are released upon task completion. Tasks may potentially release locks early if they desire. Tasks ids are unique by naming them using UUIDs or the timestamp in which the task was created. Tasks are also part of a “task group”, which is a set of tasks that can share interval locks.
There are several different types of tasks.
Append Task
-----------
Append tasks append a list of segments together into a single segment (one after the other). The grammar is:
{
"id": <task_id>,
"dataSource": <task_datasource>,
"segments": <JSON list of DataSegment objects to append>
}
Merge Task
----------
Merge tasks merge a list of segments together. Any common timestamps are merged. The grammar is:
{
"id": <task_id>,
"dataSource": <task_datasource>,
"segments": <JSON list of DataSegment objects to append>
}
Delete Task
-----------
Delete tasks create empty segments with no data. The grammar is:
{
"id": <task_id>,
"dataSource": <task_datasource>,
"segments": <JSON list of DataSegment objects to append>
}
Kill Task
---------
Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table. The available grammar is:
{
"id": <task_id>,
"dataSource": <task_datasource>,
"segments": <JSON list of DataSegment objects to append>
}
Index Task
----------
Index Partitions Task
---------------------
Index Generator Task
--------------------
Index Hadoop Task
-----------------
Index Realtime Task
-------------------
Version Converter Task
----------------------
Version Converter SubTask
-------------------------
YourKit supports the Druid open source projects with its
full-featured Java Profiler.
YourKit, LLC is the creator of innovative and intelligent tools for profiling
Java and .NET applications. Take a look at YourKit's software products:
<a href="http://www.yourkit.com/java/profiler/index.jsp">YourKit Java
Profiler</a> and
<a href="http://www.yourkit.com/.net/profiler/index.jsp">YourKit .NET
Profiler</a>.
\ No newline at end of file
Time boundary queries return the earliest and latest data points of a data set. The grammar is:
<code>{
"queryType" : "timeBoundary",
"dataSource": "sample_datasource"
}
</code>
There are 3 main parts to a time boundary query:
|property|description|required?|
|--------|-----------|---------|
|queryType|This String should always be “timeBoundary”; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String defining the data source to query, very similar to a table in a relational database|yes|
|context|An additional JSON Object which can be used to specify certain flags.|no|
The format of the result is:
<code>[ {
"timestamp" : "2013-05-09T18:24:00.000Z",
"result" : {
"minTime" : "2013-05-09T18:24:00.000Z",
"maxTime" : "2013-05-09T18:37:00.000Z"
}
} ]
</code>
Timeseries queries
==================
These types of queries take a timeseries query object and return an array of JSON objects where each object represents a value asked for by the timeseries query.
An example timeseries query object is shown below:
<pre>
<code>
{
[queryType]() “timeseries”,
[dataSource]() “sample\_datasource”,
[granularity]() “day”,
[filter]() {
[type]() “and”,
[fields]() [
{
[type]() “selector”,
[dimension]() “sample\_dimension1”,
[value]() “sample\_value1”
},
{
[type]() “or”,
[fields]() [
{
[type]() “selector”,
[dimension]() “sample\_dimension2”,
[value]() “sample\_value2”
},
{
[type]() “selector”,
[dimension]() “sample\_dimension3”,
[value]() “sample\_value3”
}
]
}
]
},
[aggregations]() [
{
[type]() “longSum”,
[name]() “sample\_name1”,
[fieldName]() “sample\_fieldName1”
},
{
[type]() “doubleSum”,
[name]() “sample\_name2”,
[fieldName]() “sample\_fieldName2”
}
],
[postAggregations]() [
{
[type]() “arithmetic”,
[name]() “sample\_divide”,
[fn]() “/”,
[fields]() [
{
[type]() “fieldAccess”,
[name]() “sample\_name1”,
[fieldName]() “sample\_fieldName1”
},
{
[type]() “fieldAccess”,
[name]() “sample\_name2”,
[fieldName]() “sample\_fieldName2”
}
]
}
],
[intervals]() [
“2012-01-01T00:00:00.000/2012-01-03T00:00:00.000”
]
}
</pre>
</code>
There are 7 main parts to a timeseries query:
|property|description|required?|
|--------|-----------|---------|
|queryType|This String should always be “timeseries”; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String defining the data source to query, very similar to a table in a relational database|yes|
|granularity|Defines the granularity of the query. See [[Granularities]]|yes|
|filter|See [[Filters]]|no|
|aggregations|See [[Aggregations]]|yes|
|postAggregations|See [[Post Aggregations]]|no|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|context|An additional JSON Object which can be used to specify certain flags.|no|
To pull it all together, the above query would return 2 data points, one for each day between 2012-01-01 and 2012-01-03, from the “sample\_datasource” table. Each data point would be the (long) sum of sample\_fieldName1, the (double) sum of sample\_fieldName2 and the (double) the result of sample\_fieldName1 divided by sample\_fieldName2 for the filter set. The output looks like this:
<pre>
<code>
[
{
[timestamp]() “2012-01-01T00:00:00.000Z”,
[result]() {
[sample\_name1]() <some_value>,
[sample\_name2]() <some_value>,
[sample\_divide]() <some_value>
}
},
{
[timestamp]() “2012-01-02T00:00:00.000Z”,
[result]() {
[sample\_name1]() <some_value>,
[sample\_name2]() <some_value>,
[sample\_divide]() <some_value>
}
}
]
</pre>
</code>
Greetings! This tutorial will help clarify some core Druid concepts. We will use a realtime dataset and issue some basic Druid queries. If you are ready to explore Druid, and learn a thing or two, read on!
About the data
--------------
The data source we’ll be working with is Wikipedia edits. Each time an edit is made in Wikipedia, an event gets pushed to an IRC channel associated with the language of the Wikipedia page. We scrape IRC channels for several different languages and load this data into Druid.
Each event has a timestamp indicating the time of the edit (in UTC time), a list of dimensions indicating various metadata about the event (such as information about the user editing the page and where the user resides), and a list of metrics associated with the event (such as the number of characters added and deleted).
Specifically. the data schema looks like so:
Dimensions (things to filter on):
\`\`\`json
“page”
“language”
“user”
“unpatrolled”
“newPage”
“robot”
“anonymous”
“namespace”
“continent”
“country”
“region”
“city”
\`\`\`
Metrics (things to aggregate over):
\`\`\`json
“count”
“added”
“delta”
“deleted”
\`\`\`
These metrics track the number of characters added, deleted, and changed.
Setting Up
----------
There are two ways to setup Druid: download a tarball, or [[Build From Source]]. You only need to do one of these.
### Download a Tarball
We’ve built a tarball that contains everything you’ll need. You’ll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.5.54-bin.tar.gz)
Download this file to a directory of your choosing.
You can extract the awesomeness within by issuing:
tar -zxvf druid-services-*-bin.tar.gz
Not too lost so far right? That’s great! If you cd into the directory:
cd druid-services-0.5.54
You should see a bunch of files:
\* run\_example\_server.sh
\* run\_example\_client.sh
\* LICENSE, config, examples, lib directories
Running Example Scripts
-----------------------
Let’s start doing stuff. You can start a Druid [[Realtime]] node by issuing:
./run_example_server.sh
Select “wikipedia”.
Once the node starts up you will see a bunch of logs about setting up properties and connecting to the data source. If everything was successful, you should see messages of the form shown below.
<code>
2013-07-19 21:54:05,154 INFO [main] com.metamx.druid.realtime.RealtimeNode - Starting Jetty
2013-07-19 21:54:05,154 INFO [main] org.mortbay.log - jetty-6.1.x
2013-07-19 21:54:05,171 INFO [chief-wikipedia] com.metamx.druid.realtime.plumber.RealtimePlumberSchool - Expect to run at [2013-07-19T22:03:00.000Z]
2013-07-19 21:54:05,246 INFO [main] org.mortbay.log - Started SelectChannelConnector@0.0.0.0:8083
</code>
The Druid real time-node ingests events in an in-memory buffer. Periodically, these events will be persisted to disk. If you are interested in the details of our real-time architecture and why we persist indexes to disk, I suggest you read our [White Paper](http://static.druid.io/docs/druid.pdf).
Okay, things are about to get real(~~time). To query the real-time node you’ve spun up, you can issue:
\<pre\>./run\_example\_client.sh\</pre\>
Select “wikipedia” once again. This script issues ]s to the data we’ve been ingesting. The query looks like this:
\`\`\`json
{
[queryType]("groupBy"),
[dataSource]("wikipedia"),
[granularity]("minute"),
[dimensions]([)
“page”
],
[aggregations]([)
{
[type]("count"),
[name]("rows")
},
{
[type]("longSum"),
[fieldName]("edit_count"),
[name]("count")
}
],
[filter]({)
[type]("selector"),
[dimension]("namespace"),
[value]("article")
},
[intervals]([)
“2013-06-01T00:00/2020-01-01T00”
]
}
\`\`\`
This is a **groupBy** query, which you may be familiar with from SQL. We are grouping, or aggregating, via the **dimensions** field: . We are **filtering** via the **“namespace”** dimension, to only look at edits on **“articles”**. Our **aggregations** are what we are calculating: a count of the number of data rows, and a count of the number of edits that have occurred.
The result looks something like this:
\`\`\`json
[
{
[version]() “v1”,
[timestamp]() “2013-09-04T21:44:00.000Z”,
[event]() {
[count]() 0,
[page]() “2013\\u201314\_Brentford\_F.C.*season",
[rows]() 1
}
},
{
[version]() "v1",
[timestamp]() "2013-09-04T21:44:00.000Z",
[event]() {
[count]() 0,
[page]() "8e*00e9tape\_du\_Tour\_de\_France\_2013”,
[rows]() 1
}
},
{
[version]() “v1”,
[timestamp]() “2013-09-04T21:44:00.000Z”,
[event]() {
[count]() 0,
[page]() “Agenda\_of\_the\_Tea\_Party\_movement”,
[rows]() 1
}
},
\`\`\`
This groupBy query is a bit complicated and we’ll return to it later. For the time being, just make sure you are getting some blocks of data back. If you are having problems, make sure you have [curl](http://curl.haxx.se/) installed. Control+C to break out of the client script.
h2. Querying Druid
In your favorite editor, create the file:
\<pre\>time\_boundary\_query.body\</pre\>
Druid queries are JSON blobs which are relatively painless to create programmatically, but an absolute pain to write by hand. So anyway, we are going to create a Druid query by hand. Add the following to the file you just created:
\<pre\><code>
{
[queryType]() “timeBoundary”,
[dataSource]() “wikipedia”
}
</code>\</pre\>
The ] is one of the simplest Druid queries. To run the query, you can issue:
\<pre\><code> curl~~X POST ‘http://localhost:8083/druid/v2/?pretty’ ~~H ‘content-type: application/json’~~d ```` time_boundary_query.body</code></pre>
We get something like this JSON back:
```json
[ {
"timestamp" : "2013-09-04T21:44:00.000Z",
"result" : {
"minTime" : "2013-09-04T21:44:00.000Z",
"maxTime" : "2013-09-04T21:47:00.000Z"
}
} ]
```
As you can probably tell, the result is indicating the maximum and minimum timestamps we've seen thus far (summarized to a minutely granularity). Let's explore a bit further.
Return to your favorite editor and create the file:
<pre>timeseries_query.body</pre>
We are going to make a slightly more complicated query, the [[TimeseriesQuery]]. Copy and paste the following into the file:
<pre><code>
{
"queryType": "timeseries",
"dataSource": "wikipedia",
"intervals": [
"2010-01-01/2020-01-01"
],
"granularity": "all",
"aggregations": [
{
"type": "longSum",
"fieldName": "count",
"name": "edit_count"
},
{
"type": "doubleSum",
"fieldName": "added",
"name": "chars_added"
}
]
}
</code></pre>
You are probably wondering, what are these [[Granularities]] and [[Aggregations]] things? What the query is doing is aggregating some metrics over some span of time.
To issue the query and get some results, run the following in your command line:
<pre><code>curl -X POST 'http://localhost:8083/druid/v2/?pretty' -H 'content-type: application/json' -d ````timeseries\_query.body</code>
</pre>
Once again, you should get a JSON blob of text back with your results, that looks something like this:
\`\`\`json
[ {
“timestamp” : “2013-09-04T21:44:00.000Z”,
“result” : {
“chars\_added” : 312670.0,
“edit\_count” : 733
}
} ]
\`\`\`
If you issue the query again, you should notice your results updating.
Right now all the results you are getting back are being aggregated into a single timestamp bucket. What if we wanted to see our aggregations on a per minute basis? What field can we change in the query to accomplish this?
If you loudly exclaimed “we can change granularity to minute”, you are absolutely correct! We can specify different granularities to bucket our results, like so:
<code>
{
"queryType": "timeseries",
"dataSource": "wikipedia",
"intervals": [
"2010-01-01/2020-01-01"
],
"granularity": "minute",
"aggregations": [
{
"type": "longSum",
"fieldName": "count",
"name": "edit_count"
},
{
"type": "doubleSum",
"fieldName": "added",
"name": "chars_added"
}
]
}
</code>
This gives us something like the following:
\`\`\`json
[
{
“timestamp” : “2013-09-04T21:44:00.000Z”,
“result” : {
“chars\_added” : 30665.0,
“edit\_count” : 128
}
}, {
“timestamp” : “2013-09-04T21:45:00.000Z”,
“result” : {
“chars\_added” : 122637.0,
“edit\_count” : 167
}
}, {
“timestamp” : “2013-09-04T21:46:00.000Z”,
“result” : {
“chars\_added” : 78938.0,
“edit\_count” : 159
}
},
\`\`\`
Solving a Problem
-----------------
One of Druid’s main powers is to provide answers to problems, so let’s pose a problem. What if we wanted to know what the top pages in the US are, ordered by the number of edits over the last few minutes you’ve been going through this tutorial? To solve this problem, we have to return to the query we introduced at the very beginning of this tutorial, the [[GroupByQuery]]. It would be nice if we could group by results by dimension value and somehow sort those results… and it turns out we can!
Let’s create the file:
group_by_query.body</pre>
and put the following in there:
<pre><code>
{
"queryType": "groupBy",
"dataSource": "wikipedia",
"granularity": "all",
"dimensions": [
"page"
],
"orderBy": {
"type": "default",
"columns": [
{
"dimension": "edit_count",
"direction": "DESCENDING"
}
],
"limit": 10
},
"aggregations": [
{
"type": "longSum",
"fieldName": "count",
"name": "edit_count"
}
],
"filter": {
"type": "selector",
"dimension": "country",
"value": "United States"
},
"intervals": [
"2012-10-01T00:00/2020-01-01T00"
]
}
</code>
Woah! Our query just got a way more complicated. Now we have these [[Filters]] things and this [[OrderBy]] thing. Fear not, it turns out the new objects we’ve introduced to our query can help define the format of our results and provide an answer to our question.
If you issue the query:
<code>curl -X POST 'http://localhost:8083/druid/v2/?pretty' -H 'content-type: application/json' -d @group_by_query.body</code>
You should see an answer to our question. As an example, some results are shown below:
\`\`\`json
[
{
“version” : “v1”,
“timestamp” : “2012-10-01T00:00:00.000Z”,
“event” : {
“page” : “RTC\_Transit”,
“edit\_count” : 6
}
}, {
“version” : “v1”,
“timestamp” : “2012-10-01T00:00:00.000Z”,
“event” : {
“page” : “List\_of\_Deadly\_Women\_episodes”,
“edit\_count” : 4
}
}, {
“version” : “v1”,
“timestamp” : “2012-10-01T00:00:00.000Z”,
“event” : {
“page” : “User\_talk:David\_Biddulph”,
“edit\_count” : 4
}
},
\`\`\`
Feel free to tweak other query parameters to answer other questions you may have about the data.
Next Steps
----------
What to know even more information about the Druid Cluster? Check out [[Tutorial: The Druid Cluster]]
Druid is even more fun if you load your own data into it! To learn how to load your data, see [[Loading Your Data]].
Additional Information
----------------------
This tutorial is merely showcasing a small fraction of what Druid can do. If you are interested in more information about Druid, including setting up a more sophisticated Druid cluster, please read the other links in our wiki.
And thus concludes our journey! Hopefully you learned a thing or two about Druid real-time ingestion, querying Druid, and how Druid can be used to solve problems. If you have additional questions, feel free to post in our [google groups page](http://www.groups.google.com/forum/#!forum/druid-development).
Welcome back! In our first [tutorial](https://github.com/metamx/druid/wiki/Tutorial%3A-A-First-Look-at-Druid), we introduced you to the most basic Druid setup: a single realtime node. We streamed in some data and queried it. Realtime nodes collect very recent data and periodically hand that data off to the rest of the Druid cluster. Some questions about the architecture must naturally come to mind. What does the rest of Druid cluster look like? How does Druid load available static data?
This tutorial will hopefully answer these questions!
In this tutorial, we will set up other types of Druid nodes as well as and external dependencies for a fully functional Druid cluster. The architecture of Druid is very much like the [Megazord](http://www.youtube.com/watch?v=7mQuHh1X4H4) from the popular 90s show Mighty Morphin' Power Rangers. Each Druid node has a specific purpose and the nodes come together to form a fully functional system.
## Downloading Druid ##
If you followed the first tutorial, you should already have Druid downloaded. If not, let's go back and do that first.
You can download the latest version of druid [here](http://static.druid.io/artifacts/releases/druid-services-0.5.54-bin.tar.gz)
and untar the contents within by issuing:
```bash
tar -zxvf druid-services-*-bin.tar.gz
cd druid-services-*
```
You can also [[Build From Source]].
## External Dependencies ##
Druid requires 3 external dependencies. A "deep" storage that acts as a backup data repository, a relational database such as MySQL to hold configuration and metadata information, and [Apache Zookeeper](http://zookeeper.apache.org/) for coordination among different pieces of the cluster.
For deep storage, we have made a public S3 bucket (static.druid.io) available where data for this particular tutorial can be downloaded. More on the data [later](https://github.com/metamx/druid/wiki/Tutorial-Part-2#the-data).
### Setting up MySQL ###
1. If you don't already have it, download MySQL Community Server here: [http://dev.mysql.com/downloads/mysql/](http://dev.mysql.com/downloads/mysql/)
2. Install MySQL
3. Create a druid user and database
```bash
mysql -u root
```
```sql
GRANT ALL ON druid.* TO 'druid'@'localhost' IDENTIFIED BY 'diurd';
CREATE database druid;
```
### Setting up Zookeeper ###
```bash
curl http://www.motorlogy.com/apache/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz -o zookeeper-3.4.5.tar.gz
tar xzf zookeeper-3.4.5.tar.gz
cd zookeeper-3.4.5
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
cd ..
```
## The Data ##
Similar to the first tutorial, the data we will be loading is based on edits that have occurred on Wikipedia. Every time someone edits a page in Wikipedia, metadata is generated about the editor and edited page. Druid collects each individual event and packages them together in a container known as a [segment](https://github.com/metamx/druid/wiki/Segments). Segments contain data over some span of time. We've prebuilt a segment for this tutorial and will cover making your own segments in other [pages](https://github.com/metamx/druid/wiki/Loading-Your-Data).The segment we are going to work with has the following format:
Dimensions (things to filter on):
```json
"page"
"language"
"user"
"unpatrolled"
"newPage"
"robot"
"anonymous"
"namespace"
"continent"
"country"
"region"
"city"
```
Metrics (things to aggregate over):
```json
"count"
"added"
"delta"
"deleted"
```
## The Cluster ##
Let's start up a few nodes and download our data. First things though, let's create a config directory where we will store configs for our various nodes:
```
mkdir config
```
If you are interested in learning more about Druid configuration files, check out this [link](https://github.com/metamx/druid/wiki/Configuration). Many aspects of Druid are customizable. For the purposes of this tutorial, we are going to use default values for most things.
### Start a Master Node ###
Master nodes are in charge of load assignment and distribution. Master nodes monitor the status of the cluster and command compute nodes to assign and drop segments.
To create the master config file:
```
mkdir config/master
```
Under the directory we just created, create the file ```runtime.properties``` with the following contents:
```
druid.host=127.0.0.1:8082
druid.port=8082
druid.service=master
# logging
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=info
# zk
druid.zk.service.host=localhost
druid.zk.paths.base=/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
# aws (demo user)
com.metamx.aws.accessKey=AKIAIMKECRUYKDQGR6YQ
com.metamx.aws.secretKey=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
# db
druid.database.segmentTable=segments
druid.database.user=druid
druid.database.password=diurd
druid.database.connectURI=jdbc:mysql://localhost:3306/druid
druid.database.ruleTable=rules
druid.database.configTable=config
# master runtime configs
druid.master.startDelay=PT60S
```
To start the master node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/master com.metamx.druid.http.MasterMain
```
### Start a Compute Node ###
Compute nodes are the workhorses of a cluster and are in charge of loading historical segments and making them available for queries. Our Wikipedia segment will be downloaded by a compute node.
To create the compute config file:
```
mkdir config/compute
```
Under the directory we just created, create the file ```runtime.properties``` with the following contents:
```
druid.host=127.0.0.1:8081
druid.port=8081
druid.service=compute
# logging
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=info
# zk
druid.zk.service.host=localhost
druid.zk.paths.base=/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
# processing
druid.processing.buffer.sizeBytes=10000000
# aws (demo user)
com.metamx.aws.accessKey=AKIAIMKECRUYKDQGR6YQ
com.metamx.aws.secretKey=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
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# Path on local FS for storage of segment metadata; dir will be created if needed
druid.paths.segmentInfoCache=/tmp/druid/segmentInfoCache
# server
druid.server.maxSize=100000000
```
To start the compute node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/compute com.metamx.druid.http.ComputeMain
```
### Start a Broker Node ###
Broker nodes are responsible for figuring out which compute and/or realtime nodes correspond to which queries. They also merge partial results from these nodes in a scatter/gather fashion.
To create the broker config file:
```
mkdir config/broker
```
Under the directory we just created, create the file ```runtime.properties``` with the following contents:
```
druid.host=127.0.0.1:8080
druid.port=8080
druid.service=broker
# logging
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=info
# zk
druid.zk.service.host=localhost
druid.zk.paths.base=/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
# thread pool size for servicing queries
druid.client.http.connections=10
```
To start the broker node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/broker com.metamx.druid.http.BrokerMain
```
<!--
### Optional: Start a Realtime Node ###
```
druid.host=127.0.0.1:8083
druid.port=8083
druid.service=realtime
# logging
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=info
# zk
druid.zk.service.host=localhost
druid.zk.paths.base=/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
# processing
druid.processing.buffer.sizeBytes=10000000
# schema
druid.realtime.specFile=realtime.spec
# aws
com.metamx.aws.accessKey=dummy_access_key
com.metamx.aws.secretKey=dummy_secret_key
# db
druid.database.segmentTable=segments
druid.database.user=druid
druid.database.password=diurd
druid.database.connectURI=jdbc:mysql://localhost:3306/druid
druid.database.ruleTable=rules
druid.database.configTable=config
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# handoff
druid.pusher.s3.bucket=dummy_s3_bucket
druid.pusher.s3.baseKey=dummy_key
```
To start the realtime node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath services/target/druid-services-*-selfcontained.jar:config/realtime com.metamx.druid.realtime.RealtimeMain
```
-->
## Loading the Data ##
The MySQL dependency we introduced earlier on contains a 'segments' table that contains entries for segments that should be loaded into our cluster. The Druid master compares this table with segments that already exist in the cluster to determine what should be loaded and dropped. To load our wikipedia segment, we need to create an entry in our MySQL segment table.
Usually, when new segments are created, these MySQL entries are created directly so you never have to do this by hand. For this tutorial, we can do this manually by going back into MySQL and issuing:
```
use druid;
```
``
INSERT INTO segments (id, dataSource, created_date, start, end, partitioned, version, used, payload) VALUES ('wikipedia_2013-08-01T00:00:00.000Z_2013-08-02T00:00:00.000Z_2013-08-08T21:22:48.989Z', 'wikipedia', '2013-08-08T21:26:23.799Z', '2013-08-01T00:00:00.000Z', '2013-08-02T00:00:00.000Z', '0', '2013-08-08T21:22:48.989Z', '1', '{\"dataSource\":\"wikipedia\",\"interval\":\"2013-08-01T00:00:00.000Z/2013-08-02T00:00:00.000Z\",\"version\":\"2013-08-08T21:22:48.989Z\",\"loadSpec\":{\"type\":\"s3_zip\",\"bucket\":\"static.druid.io\",\"key\":\"data/segments/wikipedia/20130801T000000.000Z_20130802T000000.000Z/2013-08-08T21_22_48.989Z/0/index.zip\"},\"dimensions\":\"dma_code,continent_code,geo,area_code,robot,country_name,network,city,namespace,anonymous,unpatrolled,page,postal_code,language,newpage,user,region_lookup\",\"metrics\":\"count,delta,variation,added,deleted\",\"shardSpec\":{\"type\":\"none\"},\"binaryVersion\":9,\"size\":24664730,\"identifier\":\"wikipedia_2013-08-01T00:00:00.000Z_2013-08-02T00:00:00.000Z_2013-08-08T21:22:48.989Z\"}');
``
If you look in your master node logs, you should, after a maximum of a minute or so, see logs of the following form:
```
2013-08-08 22:48:41,967 INFO [main-EventThread] com.metamx.druid.master.LoadQueuePeon - Server[/druid/loadQueue/127.0.0.1:8081] done processing [/druid/loadQueue/127.0.0.1:8081/wikipedia_2013-08-01T00:00:00.000Z_2013-08-02T00:00:00.000Z_2013-08-08T21:22:48.989Z]
2013-08-08 22:48:41,969 INFO [ServerInventoryView-0] com.metamx.druid.client.SingleServerInventoryView - Server[127.0.0.1:8081] added segment[wikipedia_2013-08-01T00:00:00.000Z_2013-08-02T00:00:00.000Z_2013-08-08T21:22:48.989Z]
```
When the segment completes downloading and ready for queries, you should see the following message on your compute node logs:
```
2013-08-08 22:48:41,959 INFO [ZkCoordinator-0] com.metamx.druid.coordination.BatchDataSegmentAnnouncer - Announcing segment[wikipedia_2013-08-01T00:00:00.000Z_2013-08-02T00:00:00.000Z_2013-08-08T21:22:48.989Z] at path[/druid/segments/127.0.0.1:8081/2013-08-08T22:48:41.959Z]
```
At this point, we can query the segment. For more information on querying, see this[link](https://github.com/metamx/druid/wiki/Querying).
## Next Steps ##
Now that you have an understanding of what the Druid clsuter looks like, why not load some of your own data?
Check out the [Loading Your Own Data](https://github.com/metamx/druid/wiki/Loading-Your-Data) section for more info!
\ No newline at end of file
Greetings! This tutorial will help clarify some core Druid concepts. We will use a realtime dataset and issue some basic Druid queries. If you are ready to explore Druid, and learn a thing or two, read on!
About the data
--------------
The data source we’ll be working with is the Bit.ly USA Government website statistics stream. You can see the stream [here](http://developer.usa.gov/1usagov), and read about the stream [here](http://www.usa.gov/About/developer-resources/1usagov.shtml) . This is a feed of json data that gets updated whenever anyone clicks a bit.ly shortened USA.gov website. A typical event might look something like this:
\`\`\`json
{
[user\_agent]() “Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)”,
[country]() “US”,
[known\_user]() 1,
[timezone]() “America/New\_York”,
[geo\_region]() “DC”,
[global\_bitly\_hash]() “17ctAFs”,
[encoding\_user\_bitly\_hash]() “17ctAFr”,
[encoding\_user\_login]() “senrubiopress”,
[aaccept\_language]() “en-US”,
[short\_url\_cname]() “1.usa.gov”,
[referring\_url]() “http://t.co/4Av4NUFAYq”,
[long\_url]() “http://www.rubio.senate.gov/public/index.cfm/fighting-for-florida?ID=c8357d12-9da8-4e9d-b00d-7168e1bf3599”,
[timestamp]() 1372190407,
[timestamp of time hash was created]() 1372190097,
[city]() “Washington”,
[latitude\_longitude]() [
38.893299,
~~77.014603
]
}
\`\`\`
The “known\_user” field is always 1 or 0. It is 1 if the user is known to the server, and 0 otherwise. We will use this field extensively in this demo.
h2. Setting Up
There are two ways to setup Druid: download a tarball, or ]. You only need to do one of these.
h3. Download a Tarball
We’ve built a tarball that contains everything you’ll need. You’ll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.5.50-bin.tar.gz)
Download this file to a directory of your choosing.
You can extract the awesomeness within by issuing:
\<pre\>tar~~zxvf druid-services~~**~~bin.tar.gz\</pre\>
Not too lost so far right? That’s great! If you cd into the directory:
\<pre\>cd druid-services-0.5.50\</pre\>
You should see a bunch of files:
\* run\_example\_server.sh
\* run\_example\_client.sh
\* LICENSE, config, examples, lib directories
h2. Running Example Scripts
Let’s start doing stuff. You can start a Druid ] node by issuing:
\<pre\>./run\_example\_server.sh\</pre\>
Select “webstream”.
Once the node starts up you will see a bunch of logs about setting up properties and connecting to the data source. If everything was successful, you should see messages of the form shown below.
\<pre\><code>
2013-07-19 21:54:05,154 INFO com.metamx.druid.realtime.RealtimeNode~~ Starting Jetty
2013-07-19 21:54:05,154 INFO org.mortbay.log - jetty-6.1.x
2013-07-19 21:54:05,171 INFO com.metamx.druid.realtime.plumber.RealtimePlumberSchool - Expect to run at
2013-07-19 21:54:05,246 INFO org.mortbay.log - Started SelectChannelConnector@0.0.0.0:8083
</code>\</pre\>
The Druid real time-node ingests events in an in-memory buffer. Periodically, these events will be persisted to disk. If you are interested in the details of our real-time architecture and why we persist indexes to disk, I suggest you read our [White Paper](http://static.druid.io/docs/druid.pdf).
Okay, things are about to get real. To query the real-time node you’ve spun up, you can issue:
\<pre\>./run\_example\_client.sh\</pre\>
Select “webstream” once again. This script issues ]s to the data we’ve been ingesting. The query looks like this:
\`\`\`json
{
[queryType]() “groupBy”,
[dataSource]() “webstream”,
[granularity]() “minute”,
[dimensions]() [
“timezone”
],
[aggregations]() [
{
[type]() “count”,
[name]() “rows”
},
{
[type]() “doubleSum”,
[fieldName]() “known\_users”,
[name]() “known\_users”
}
],
[filter]() {
[type]() “selector”,
[dimension]() “country”,
[value]() “US”
},
[intervals]() [
“2013-06-01T00:00/2020-01-01T00”
]
}
\`\`\`
This is a****groupBy**\* query, which you may be familiar with from SQL. We are grouping, or aggregating, via the **dimensions** field: . We are **filtering** via the **“country”** dimension, to only look at website hits in the US. Our **aggregations** are what we are calculating: a row count, and the sum of the number of known users in our data.
The result looks something like this:
\`\`\`json
[
{
[version]() “v1”,
[timestamp]() “2013-07-18T19:39:00.000Z”,
[event]() {
[timezone]() “America/Chicago”,
[known\_users]() 10,
[rows]() 15
}
},
{
[version]() “v1”,
[timestamp]() “2013-07-18T19:39:00.000Z”,
[event]() {
[timezone]() “America/Los\_Angeles”,
[known\_users]() 0,
[rows]() 3
}
},
\`\`\`
This groupBy query is a bit complicated and we’ll return to it later. For the time being, just make sure you are getting some blocks of data back. If you are having problems, make sure you have [curl](http://curl.haxx.se/) installed. Control+C to break out of the client script.
h2. Querying Druid
In your favorite editor, create the file:
\<pre\>time\_boundary\_query.body\</pre\>
Druid queries are JSON blobs which are relatively painless to create programmatically, but an absolute pain to write by hand. So anyway, we are going to create a Druid query by hand. Add the following to the file you just created:
\<pre\><code>
{
[queryType]() “timeBoundary”,
[dataSource]() “webstream”
}
</code>\</pre\>
The ] is one of the simplest Druid queries. To run the query, you can issue:
\<pre\><code> curl~~X POST ‘http://localhost:8083/druid/v2/?pretty’ ~~H ‘content-type: application/json’~~d ```` time_boundary_query.body</code></pre>
We get something like this JSON back:
```json
[
{
"timestamp": "2013-07-18T19:39:00.000Z",
"result": {
"minTime": "2013-07-18T19:39:00.000Z",
"maxTime": "2013-07-18T19:46:00.000Z"
}
}
]
```
As you can probably tell, the result is indicating the maximum and minimum timestamps we've seen thus far (summarized to a minutely granularity). Let's explore a bit further.
Return to your favorite editor and create the file:
<pre>timeseries_query.body</pre>
We are going to make a slightly more complicated query, the [[TimeseriesQuery]]. Copy and paste the following into the file:
<pre><code>
{
"queryType": "timeseries",
"dataSource": "webstream",
"intervals": [
"2010-01-01/2020-01-01"
],
"granularity": "all",
"aggregations": [
{
"type": "count",
"name": "rows"
},
{
"type": "doubleSum",
"fieldName": "known_users",
"name": "known_users"
}
]
}
</code></pre>
You are probably wondering, what are these [[Granularities]] and [[Aggregations]] things? What the query is doing is aggregating some metrics over some span of time.
To issue the query and get some results, run the following in your command line:
<pre><code>curl -X POST 'http://localhost:8083/druid/v2/?pretty' -H 'content-type: application/json' -d ````timeseries\_query.body</code>
</pre>
Once again, you should get a JSON blob of text back with your results, that looks something like this:
\`\`\`json
[
{
“timestamp” : “2013-07-18T19:39:00.000Z”,
“result” : {
“known\_users” : 787.0,
“rows” : 2004
}
}
]
\`\`\`
If you issue the query again, you should notice your results updating.
Right now all the results you are getting back are being aggregated into a single timestamp bucket. What if we wanted to see our aggregations on a per minute basis? What field can we change in the query to accomplish this?
If you loudly exclaimed “we can change granularity to minute”, you are absolutely correct! We can specify different granularities to bucket our results, like so:
<code>
{
"queryType": "timeseries",
"dataSource": "webstream",
"intervals": [
"2010-01-01/2020-01-01"
],
"granularity": "minute",
"aggregations": [
{
"type": "count",
"name": "rows"
},
{
"type": "doubleSum",
"fieldName": "known_users",
"name": "known_users"
}
]
}
</code>
This gives us something like the following:
\`\`\`json
[
{
[timestamp]() “2013-07-18T19:39:00.000Z”,
[result]() {
[known\_users]() 33,
[rows]() 76
}
},
{
[timestamp]() “2013-07-18T19:40:00.000Z”,
[result]() {
[known\_users]() 105,
[rows]() 221
}
},
{
[timestamp]() “2013-07-18T19:41:00.000Z”,
[result]() {
[known\_users]() 53,
[rows]() 167
}
},
\`\`\`
Solving a Problem
-----------------
One of Druid’s main powers is to provide answers to problems, so let’s pose a problem. What if we wanted to know what the top states in the US are, ordered by the number of visits by known users over the last few minutes? To solve this problem, we have to return to the query we introduced at the very beginning of this tutorial, the [[GroupByQuery]]. It would be nice if we could group by results by dimension value and somehow sort those results… and it turns out we can!
Let’s create the file:
group_by_query.body</pre>
and put the following in there:
<pre><code>
{
"queryType": "groupBy",
"dataSource": "webstream",
"granularity": "all",
"dimensions": [
"geo_region"
],
"orderBy": {
"type": "default",
"columns": [
{
"dimension": "known_users",
"direction": "DESCENDING"
}
],
"limit": 10
},
"aggregations": [
{
"type": "count",
"name": "rows"
},
{
"type": "doubleSum",
"fieldName": "known_users",
"name": "known_users"
}
],
"filter": {
"type": "selector",
"dimension": "country",
"value": "US"
},
"intervals": [
"2012-10-01T00:00/2020-01-01T00"
]
}
</code>
Woah! Our query just got a way more complicated. Now we have these [[Filters]] things and this [[OrderBy]] thing. Fear not, it turns out the new objects we’ve introduced to our query can help define the format of our results and provide an answer to our question.
If you issue the query:
<code>curl -X POST 'http://localhost:8083/druid/v2/?pretty' -H 'content-type: application/json' -d @group_by_query.body</code>
You should see an answer to our question. For my stream, it looks like this:
\`\`\`json
[
{
[version]() “v1”,
[timestamp]() “2012-10-01T00:00:00.000Z”,
[event]() {
[geo\_region]() “RI”,
[known\_users]() 359,
[rows]() 143
}
},
{
[version]() “v1”,
[timestamp]() “2012-10-01T00:00:00.000Z”,
[event]() {
[geo\_region]() “NY”,
[known\_users]() 187,
[rows]() 322
}
},
{
[version]() “v1”,
[timestamp]() “2012-10-01T00:00:00.000Z”,
[event]() {
[geo\_region]() “CA”,
[known\_users]() 145,
[rows]() 466
}
},
{
[version]() “v1”,
[timestamp]() “2012-10-01T00:00:00.000Z”,
[event]() {
[geo\_region]() “IL”,
[known\_users]() 121,
[rows]() 185
}
},
\`\`\`
Feel free to tweak other query parameters to answer other questions you may have about the data.
Next Steps
----------
What to know even more information about the Druid Cluster? Check out [[Tutorial: The Druid Cluster]]
Druid is even more fun if you load your own data into it! To learn how to load your data, see [[Loading Your Data]].
Additional Information
----------------------
This tutorial is merely showcasing a small fraction of what Druid can do. If you are interested in more information about Druid, including setting up a more sophisticated Druid cluster, please read the other links in our wiki.
And thus concludes our journey! Hopefully you learned a thing or two about Druid real-time ingestion, querying Druid, and how Druid can be used to solve problems. If you have additional questions, feel free to post in our [google groups page](http://www.groups.google.com/forum/#!forum/druid-development).
此差异已折叠。
This page discusses how we do versioning and provides information on our stable releases.
Versioning Strategy
-------------------
We generally follow [semantic versioning](http://semver.org/). The general idea is
- “Major” version (leftmost): backwards incompatible, no guarantees exist about APIs between the versions
- “Minor” version (middle number): you can move forward from a smaller number to a larger number, but moving backwards *might* be incompatible.
- “bug-fix” version (“patch” or the rightmost): Interchangeable. The higher the number, the more things are fixed (hopefully), but the programming interfaces are completely compatible and you should be able to just drop in a new jar and have it work.
Note that this is defined in terms of programming API, **not** in terms of functionality. It is possible that a brand new awesome way of doing something is introduced in a “bug-fix” release version if it doesn’t add to the public API or change it.
One exception for right now, while we are still in major version 0, we are considering the APIs to be in beta and are conflating “major” and “minor” so a minor version increase could be backwards incompatible for as long as we are at major version 0. These will be communicated via email on the group.
For external deployments, we recommend running the stable release tag. Releases are considered stable after we have deployed them into our production environment and they have operated bug-free for some time.
Tagging strategy
----------------
Tags of the codebase are equivalent to release candidates. We tag the code every time we want to take it through our release process, which includes some QA cycles and deployments. So, it is not safe to assume that a tag is a stable release, it is a solidification of the code as it goes through our production QA cycle and deployment. Tags will never change, but we often go through a number of iterations of tags before actually getting a stable release onto production. So, it is recommended that if you are not aware of what is on a tag, to stick to the stable releases listed on the [[Download]] page.
Druid uses ZooKeeper (ZK) for management of current cluster state. The operations that happen over ZK are
1. [[Master]] leader election
2. Segment “publishing” protocol from [[Compute]] and [[Realtime]]
3. Segment load/drop protocol between [[Master]] and [[Compute]]
### Property Configuration
ZooKeeper paths are set via the `runtime.properties` configuration file. Druid will automatically create paths that do not exist, so typos in config files is a very easy way to become split-brained.
There is a prefix path that is required and can be used as the only (well, kinda, see the note below) path-related zookeeper configuration parameter (everything else will be a default based on the prefix):
druid.zk.paths.base
You can also override each individual path (defaults are shown below):
druid.zk.paths.propertiesPath=${druid.zk.paths.base}/properties
druid.zk.paths.announcementsPath=${druid.zk.paths.base}/announcements
druid.zk.paths.servedSegmentsPath=${druid.zk.paths.base}/servedSegments
druid.zk.paths.loadQueuePath=${druid.zk.paths.base}/loadQueue
druid.zk.paths.masterPath=${druid.zk.paths.base}/master
druid.zk.paths.indexer.announcementsPath=${druid.zk.paths.base}/indexer/announcements
druid.zk.paths.indexer.tasksPath=${druid.zk.paths.base}/indexer/tasks
druid.zk.paths.indexer.statusPath=${druid.zk.paths.base}/indexer/status
druid.zk.paths.indexer.leaderLatchPath=${druid.zk.paths.base}/indexer/leaderLatchPath
NOTE: We also use Curator’s service discovery module to expose some services via zookeeper. This also uses a zookeeper path, but this path is **not** affected by `druid.zk.paths.base` and **must** be specified separately. This property is
druid.zk.paths.discoveryPath
### Master Leader Election
We use the Curator LeadershipLatch recipe to do leader election at path
${druid.zk.paths.masterPath}/_MASTER
### Segment “publishing” protocol from Compute and Realtime
The `announcementsPath` and `servedSegmentsPath` are used for this.
All [[Compute]] and [[Realtime]] nodes publish themselves on the `announcementsPath`, specifically, they will create an ephemeral znode at
${druid.zk.paths.announcementsPath}/${druid.host}
Which signifies that they exist. They will also subsequently create a permanent znode at
${druid.zk.paths.servedSegmentsPath}/${druid.host}
And as they load up segments, they will attach ephemeral znodes that look like
${druid.zk.paths.servedSegmentsPath}/${druid.host}/_segment_identifier_
Nodes like the [[Master]] and [[Broker]] can then watch these paths to see which nodes are currently serving which segments.
### Segment load/drop protocol between Master and Compute
The `loadQueuePath` is used for this.
When the [[Master]] decides that a [[Compute]] node should load or drop a segment, it writes an ephemeral znode to
${druid.zk.paths.loadQueuePath}/_host_of_compute_node/_segment_identifier
This node will contain a payload that indicates to the Compute node what it should do with the given segment. When the Compute node is done with the work, it will delete the znode in order to signify to the Master that it is complete.
Contents
\* [[Introduction|Home]]
\* [[Download]]
\* [[Support]]
\* [[Contribute]]
========================
Getting Started
\* [[Tutorial: A First Look at Druid]]
\* [[Tutorial: The Druid Cluster]]
\* [[Loading Your Data]]
\* [[Querying Your Data]]
\* [[Booting a Production Cluster]]
\* [[Examples]]
\* [[Cluster Setup]]
\* [[Configuration]]
--------------------------------------
Data Ingestion
\* [[Realtime]]
\* [[Batch|Batch Ingestion]]
\* [[Indexing Service]]
----------------------------
Querying
\* [[Querying]]
**\* ]
**\* [[Aggregations]]
**\* ]
**\* [[Granularities]]
\* Query Types
**\* ]
****\* ]
****\* ]
**\* [[SearchQuery]]
**\* ]
** [[SegmentMetadataQuery]]
**\* ]
**\* [[TimeseriesQuery]]
---------------------------
Architecture
\* [[Design]]
\* [[Segments]]
\* Node Types
**\* ]
**\* [[Broker]]
**\* ]
****\* ]
**\* [[Realtime]]
**\* ]
**\* [[Plumber]]
\* External Dependencies
**\* ]
**\* [[MySQL]]
**\* ]
** [[Concepts and Terminology]]
-------------------------------
Development
\* [[Versioning]]
\* [[Build From Source]]
\* [[Libraries]]
------------------------
Misc
\* [[Thanks]]
-------------
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册