提交 5a3d86ab 编写于 作者: W wizardforcel

init

上级
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# dotenv
.env
# virtualenv
.venv
venv/
ENV/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.DS_Store
# gitbook
_book
# node.js
node_modules
# windows
Thumbs.db
# word
~$*.docx
~$*.doc
# HBase™ 中文参考指南 3.0
![](img/9401b38c9b161e7f2680ea8cd2972031.jpg)
> 作者:[Apache HBase Team](hbase-dev@lists.apache.org)
(施工中)
\ No newline at end of file
+ [HBase™ 中文参考指南 3.0](README.md)
+ [Preface](docs/0.md)
+ [Getting Started](docs/1.md)
+ [Apache HBase Configuration](docs/2.md)
+ [Upgrading](docs/3.md)
+ [The Apache HBase Shell](docs/4.md)
+ [Data Model](docs/5.md)
+ [HBase and Schema Design](docs/6.md)
+ [RegionServer Sizing Rules of Thumb](docs/7.md)
+ [HBase and MapReduce](docs/8.md)
+ [Securing Apache HBase](docs/9.md)
+ [Architecture](docs/10.md)
+ [In-memory Compaction](docs/11.md)
+ [Backup and Restore](docs/12.md)
+ [Synchronous Replication](docs/13.md)
+ [Apache HBase APIs](docs/14.md)
+ [Apache HBase External APIs](docs/15.md)
+ [Thrift API and Filter Language](docs/16.md)
+ [HBase and Spark](docs/17.md)
+ [Apache HBase Coprocessors](docs/18.md)
+ [Apache HBase Performance Tuning](docs/19.md)
+ [Troubleshooting and Debugging Apache HBase](docs/20.md)
+ [Apache HBase Case Studies](docs/21.md)
+ [Apache HBase Operational Management](docs/22.md)
+ [Building and Developing Apache HBase](docs/23.md)
+ [Unit Testing HBase Applications](docs/24.md)
+ [Protobuf in HBase](docs/25.md)
+ [Procedure Framework (Pv2): HBASE-12439](docs/26.md)
+ [AMv2 Description for Devs](docs/27.md)
+ [ZooKeeper](docs/28.md)
+ [Community](docs/29.md)
+ [Appendix](docs/30.md)
# Preface
This is the official reference guide for the [HBase](https://hbase.apache.org/) version it ships with.
Herein you will find either the definitive documentation on an HBase topic as of its standing when the referenced HBase version shipped, or it will point to the location in [Javadoc](https://hbase.apache.org/apidocs/index.html) or [JIRA](https://issues.apache.org/jira/browse/HBASE) where the pertinent information can be found.
About This Guide
This reference guide is a work in progress. The source for this guide can be found in the _src/main/asciidoc directory of the HBase source. This reference guide is marked up using [AsciiDoc](http://asciidoc.org/) from which the finished guide is generated as part of the 'site' build target. Run
```
mvn site
```
to generate this documentation. Amendments and improvements to the documentation are welcomed. Click [this link](https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12310753&issuetype=1&components=12312132&summary=SHORT+DESCRIPTION) to file a new documentation bug against Apache HBase with some values pre-selected.
Contributing to the Documentation
For an overview of AsciiDoc and suggestions to get started contributing to the documentation, see the [relevant section later in this documentation](#appendix_contributing_to_documentation).
Heads-up if this is your first foray into the world of distributed computing…?
If this is your first foray into the wonderful world of Distributed Computing, then you are in for some interesting times. First off, distributed systems are hard; making a distributed system hum requires a disparate skillset that spans systems (hardware and software) and networking.
Your cluster’s operation can hiccup because of any of a myriad set of reasons from bugs in HBase itself through misconfigurations?—?misconfiguration of HBase but also operating system misconfigurations?—?through to hardware problems whether it be a bug in your network card drivers or an underprovisioned RAM bus (to mention two recent examples of hardware issues that manifested as "HBase is slow"). You will also need to do a recalibration if up to this your computing has been bound to a single box. Here is one good starting point: [Fallacies of Distributed Computing](http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing).
That said, you are welcome.
It’s a fun place to be.
Yours, the HBase Community.
Reporting Bugs
Please use [JIRA](https://issues.apache.org/jira/browse/hbase) to report non-security-related bugs.
To protect existing HBase installations from new vulnerabilities, please **do not** use JIRA to report security-related bugs. Instead, send your report to the mailing list [private@hbase.apache.org](mailto:private@hbase.apache.org), which allows anyone to send messages, but restricts who can read them. Someone on that list will contact you to follow up on your report.
Support and Testing Expectations
The phrases /supported/, /not supported/, /tested/, and /not tested/ occur several places throughout this guide. In the interest of clarity, here is a brief explanation of what is generally meant by these phrases, in the context of HBase.
> Commercial technical support for Apache HBase is provided by many Hadoop vendors. This is not the sense in which the term /support/ is used in the context of the Apache HBase project. The Apache HBase team assumes no responsibility for your HBase clusters, your configuration, or your data.
Supported
In the context of Apache HBase, /supported/ means that HBase is designed to work in the way described, and deviation from the defined behavior or functionality should be reported as a bug.
Not Supported
In the context of Apache HBase, /not supported/ means that a use case or use pattern is not expected to work and should be considered an antipattern. If you think this designation should be reconsidered for a given feature or use pattern, file a JIRA or start a discussion on one of the mailing lists.
Tested
In the context of Apache HBase, /tested/ means that a feature is covered by unit or integration tests, and has been proven to work as expected.
Not Tested
In the context of Apache HBase, /not tested/ means that a feature or use pattern may or may not work in a given way, and may or may not corrupt your data or cause operational issues. It is an unknown, and there are no guarantees. If you can provide proof that a feature designated as /not tested/ does work in a given way, please submit the tests and/or the metrics so that other users can gain certainty about such features or use patterns.
此差异已折叠。
此差异已折叠。
# In-memory Compaction
## 77\. Overview
In-memory Compaction (A.K.A Accordion) is a new feature in hbase-2.0.0. It was first introduced on the Apache HBase Blog at [Accordion: HBase Breathes with In-Memory Compaction](https://blogs.apache.org/hbase/entry/accordion-hbase-breathes-with-in). Quoting the blog:
> Accordion reapplies the LSM principal [_Log-Structured-Merge Tree_, the design pattern upon which HBase is based] to MemStore, in order to eliminate redundancies and other overhead while the data is still in RAM. Doing so decreases the frequency of flushes to HDFS, thereby reducing the write amplification and the overall disk footprint. With less flushes, the write operations are stalled less frequently as the MemStore overflows, therefore the write performance is improved. Less data on disk also implies less pressure on the block cache, higher hit rates, and eventually better read response times. Finally, having less disk writes also means having less compaction happening in the background, i.e., less cycles are stolen from productive (read and write) work. All in all, the effect of in-memory compaction can be envisioned as a catalyst that enables the system move faster as a whole.
A developer view is available at [Accordion: Developer View of In-Memory Compaction](https://blogs.apache.org/hbase/entry/accordion-developer-view-of-in).
In-memory compaction works best when high data churn; overwrites or over-versions can be eliminated while the data is still in memory. If the writes are all uniques, it may drag write throughput (In-memory compaction costs CPU). We suggest you test and compare before deploying to production.
In this section we describe how to enable Accordion and the available configurations.
## 78\. Enabling
To enable in-memory compactions, set the _IN_MEMORY_COMPACTION_ attribute on per column family where you want the behavior. The _IN_MEMORY_COMPACTION_ attribute can have one of four values.
* _NONE_: No in-memory compaction.
* _BASIC_: Basic policy enables flushing and keeps a pipeline of flushes until we trip the pipeline maximum threshold and then we flush to disk. No in-memory compaction but can help throughput as data is moved from the profligate, native ConcurrentSkipListMap data-type to more compact (and efficient) data types.
* _EAGER_: This is _BASIC_ policy plus in-memory compaction of flushes (much like the on-disk compactions done to hfiles); on compaction we apply on-disk rules eliminating versions, duplicates, ttl’d cells, etc.
* _ADAPTIVE_: Adaptive compaction adapts to the workload. It applies either index compaction or data compaction based on the ratio of duplicate cells in the data. Experimental.
To enable _BASIC_ on the _info_ column family in the table _radish_, disable the table and add the attribute to the _info_ column family, and then reenable:
```
hbase(main):002:0> disable 'radish'
Took 0.5570 seconds
hbase(main):003:0> alter 'radish', {NAME => 'info', IN_MEMORY_COMPACTION => 'BASIC'}
Updating all regions with the new schema...
All regions updated.
Done.
Took 1.2413 seconds
hbase(main):004:0> describe 'radish'
Table radish is DISABLED
radish
COLUMN FAMILIES DESCRIPTION
{NAME => 'info', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {
'IN_MEMORY_COMPACTION' => 'BASIC'}}
1 row(s)
Took 0.0239 seconds
hbase(main):005:0> enable 'radish'
Took 0.7537 seconds
```
Note how the IN_MEMORY_COMPACTION attribute shows as part of the _METADATA_ map.
There is also a global configuration, _hbase.hregion.compacting.memstore.type_ which you can set in your _hbase-site.xml_ file. Use it to set the default on creation of a new table (On creation of a column family Store, we look first to the column family configuration looking for the _IN_MEMORY_COMPACTION_ setting, and if none, we then consult the _hbase.hregion.compacting.memstore.type_ value using its content; default is _BASIC_).
By default, new hbase system tables will have _BASIC_ in-memory compaction set. To specify otherwise, on new table-creation, set _hbase.hregion.compacting.memstore.type_ to _NONE_ (Note, setting this value post-creation of system tables will not have a retroactive effect; you will have to alter your tables to set the in-memory attribute to _NONE_).
When an in-memory flush happens is calculated by dividing the configured region flush size (Set in the table descriptor or read from _hbase.hregion.memstore.flush.size_) by the number of column families and then multiplying by _hbase.memstore.inmemoryflush.threshold.factor_. Default is 0.014.
The number of flushes carried by the pipeline is monitored so as to fit within the bounds of memstore sizing but you can also set a maximum on the number of flushes total by setting _hbase.hregion.compacting.pipeline.segments.limit_. Default is 2.
When a column family Store is created, it says what memstore type is in effect. As of this writing there is the old-school _DefaultMemStore_ which fills a _ConcurrentSkipListMap_ and then flushes to disk or the new _CompactingMemStore_ that is the implementation that provides this new in-memory compactions facility. Here is a log-line from a RegionServer that shows a column family Store named _family_ configured to use a _CompactingMemStore_:
```
Note how the IN_MEMORY_COMPACTION attribute shows as part of the _METADATA_ map.
2018-03-30 11:02:24,466 INFO [Time-limited test] regionserver.HStore(325): Store=family, memstore type=CompactingMemStore, storagePolicy=HOT, verifyBulkLoads=false, parallelPutCountPrintThreshold=10
```
Enable TRACE-level logging on the CompactingMemStore class (_org.apache.hadoop.hbase.regionserver.CompactingMemStore_) to see detail on its operation.
此差异已折叠。
# Synchronous Replication
## 93\. Background
The current [replication](#_cluster_replication) in HBase in asynchronous. So if the master cluster crashes, the slave cluster may not have the newest data. If users want strong consistency then they can not switch to the slave cluster.
## 94\. Design
Please see the design doc on [HBASE-19064](https://issues.apache.org/jira/browse/HBASE-19064)
## 95\. Operation and maintenance
Case.1 Setup two synchronous replication clusters
* Add a synchronous peer in both source cluster and peer cluster.
For source cluster:
```
hbase> add_peer '1', CLUSTER_KEY => 'lg-hadoop-tst-st01.bj:10010,lg-hadoop-tst-st02.bj:10010,lg-hadoop-tst-st03.bj:10010:/hbase/test-hbase-slave', REMOTE_WAL_DIR=>'hdfs://lg-hadoop-tst-st01.bj:20100/hbase/test-hbase-slave/remoteWALs', TABLE_CFS => {"ycsb-test"=>[]}
```
For peer cluster:
```
hbase> add_peer '1', CLUSTER_KEY => 'lg-hadoop-tst-st01.bj:10010,lg-hadoop-tst-st02.bj:10010,lg-hadoop-tst-st03.bj:10010:/hbase/test-hbase', REMOTE_WAL_DIR=>'hdfs://lg-hadoop-tst-st01.bj:20100/hbase/test-hbase/remoteWALs', TABLE_CFS => {"ycsb-test"=>[]}
```
> For synchronous replication, the current implementation require that we have the same peer id for both source and peer cluster. Another thing that need attention is: the peer does not support cluster-level, namespace-level, or cf-level replication, only support table-level replication now.
* Transit the peer cluster to be STANDBY state
```
hbase> transit_peer_sync_replication_state '1', 'STANDBY'
```
* Transit the source cluster to be ACTIVE state
```
hbase> transit_peer_sync_replication_state '1', 'ACTIVE'
```
Now, the synchronous replication has been set up successfully. the HBase client can only request to source cluster, if request to peer cluster, the peer cluster which is STANDBY state now will reject the read/write requests.
Case.2 How to operate when standby cluster crashed
If the standby cluster has been crashed, it will fail to write remote WAL for the active cluster. So we need to transit the source cluster to DOWNGRANDE_ACTIVE state, which means source cluster won’t write any remote WAL any more, but the normal replication (asynchronous Replication) can still work fine, it queue the newly written WALs, but the replication block until the peer cluster come back.
```
hbase> transit_peer_sync_replication_state '1', 'DOWNGRADE_ACTIVE'
```
Once the peer cluster come back, we can just transit the source cluster to ACTIVE, to ensure that the replication will be synchronous.
```
hbase> transit_peer_sync_replication_state '1', 'ACTIVE'
```
Case.3 How to operate when active cluster crashed
If the active cluster has been crashed (it may be not reachable now), so let’s just transit the standby cluster to DOWNGRANDE_ACTIVE state, and after that, we should redirect all the requests from client to the DOWNGRADE_ACTIVE cluster.
```
hbase> transit_peer_sync_replication_state '1', 'DOWNGRADE_ACTIVE'
```
If the crashed cluster come back again, we just need to transit it to STANDBY directly. Otherwise if you transit the cluster to DOWNGRADE_ACTIVE, the original ACTIVE cluster may have redundant data compared to the current ACTIVE cluster. Because we designed to write source cluster WALs and remote cluster WALs concurrently, so it’s possible that the source cluster WALs has more data than the remote cluster, which result in data inconsistency. The procedure of transiting ACTIVE to STANDBY has no problem, because we’ll skip to replay the original WALs.
```
hbase> transit_peer_sync_replication_state '1', 'STANDBY'
```
After that, we can promote the DOWNGRADE_ACTIVE cluster to ACTIVE now, to ensure that the replication will be synchronous.
```
hbase> transit_peer_sync_replication_state '1', 'ACTIVE'
```
# Apache HBase APIs
This chapter provides information about performing operations using HBase native APIs. This information is not exhaustive, and provides a quick reference in addition to the [User API Reference](https://hbase.apache.org/apidocs/index.html). The examples here are not comprehensive or complete, and should be used for purposes of illustration only.
Apache HBase also works with multiple external APIs. See [Apache HBase External APIs](#external_apis) for more information.
## 96\. Examples
Example 25\. Create, modify and delete a Table Using Java
```
package com.example.hbase.admin;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.io.compress.Compression.Algorithm;
public class Example {
private static final String TABLE_NAME = "MY_TABLE_NAME_TOO";
private static final String CF_DEFAULT = "DEFAULT_COLUMN_FAMILY";
public static void createOrOverwrite(Admin admin, HTableDescriptor table) throws IOException {
if (admin.tableExists(table.getTableName())) {
admin.disableTable(table.getTableName());
admin.deleteTable(table.getTableName());
}
admin.createTable(table);
}
public static void createSchemaTables(Configuration config) throws IOException {
try (Connection connection = ConnectionFactory.createConnection(config);
Admin admin = connection.getAdmin()) {
HTableDescriptor table = new HTableDescriptor(TableName.valueOf(TABLE_NAME));
table.addFamily(new HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.NONE));
System.out.print("Creating table. ");
createOrOverwrite(admin, table);
System.out.println(" Done.");
}
}
public static void modifySchema (Configuration config) throws IOException {
try (Connection connection = ConnectionFactory.createConnection(config);
Admin admin = connection.getAdmin()) {
TableName tableName = TableName.valueOf(TABLE_NAME);
if (!admin.tableExists(tableName)) {
System.out.println("Table does not exist.");
System.exit(-1);
}
HTableDescriptor table = admin.getTableDescriptor(tableName);
// Update existing table
HColumnDescriptor newColumn = new HColumnDescriptor("NEWCF");
newColumn.setCompactionCompressionType(Algorithm.GZ);
newColumn.setMaxVersions(HConstants.ALL_VERSIONS);
admin.addColumn(tableName, newColumn);
// Update existing column family
HColumnDescriptor existingColumn = new HColumnDescriptor(CF_DEFAULT);
existingColumn.setCompactionCompressionType(Algorithm.GZ);
existingColumn.setMaxVersions(HConstants.ALL_VERSIONS);
table.modifyFamily(existingColumn);
admin.modifyTable(tableName, table);
// Disable an existing table
admin.disableTable(tableName);
// Delete an existing column family
admin.deleteColumn(tableName, CF_DEFAULT.getBytes("UTF-8"));
// Delete a table (Need to be disabled first)
admin.deleteTable(tableName);
}
}
public static void main(String... args) throws IOException {
Configuration config = HBaseConfiguration.create();
//Add any necessary configuration files (hbase-site.xml, core-site.xml)
config.addResource(new Path(System.getenv("HBASE_CONF_DIR"), "hbase-site.xml"));
config.addResource(new Path(System.getenv("HADOOP_CONF_DIR"), "core-site.xml"));
createSchemaTables(config);
modifySchema(config);
}
}
```
此差异已折叠。
# Thrift API and Filter Language
Apache [Thrift](https://thrift.apache.org/) is a cross-platform, cross-language development framework. HBase includes a Thrift API and filter language. The Thrift API relies on client and server processes.
You can configure Thrift for secure authentication at the server and client side, by following the procedures in [Client-side Configuration for Secure Operation - Thrift Gateway](#security.client.thrift) and [Configure the Thrift Gateway to Authenticate on Behalf of the Client](#security.gateway.thrift).
The rest of this chapter discusses the filter language provided by the Thrift API.
## 103\. Filter Language
Thrift Filter Language was introduced in HBase 0.92. It allows you to perform server-side filtering when accessing HBase over Thrift or in the HBase shell. You can find out more about shell integration by using the `scan help` command in the shell.
You specify a filter as a string, which is parsed on the server to construct the filter.
### 103.1\. General Filter String Syntax
A simple filter expression is expressed as a string:
```
“FilterName (argument, argument,... , argument)”
```
Keep the following syntax guidelines in mind.
* Specify the name of the filter followed by the comma-separated argument list in parentheses.
* If the argument represents a string, it should be enclosed in single quotes (`'`).
* Arguments which represent a boolean, an integer, or a comparison operator (such as <, >, or !=), should not be enclosed in quotes
* The filter name must be a single word. All ASCII characters are allowed except for whitespace, single quotes and parentheses.
* The filter’s arguments can contain any ASCII character. If single quotes are present in the argument, they must be escaped by an additional preceding single quote.
### 103.2\. Compound Filters and Operators
Binary Operators
`AND`
If the `AND` operator is used, the key-value must satisfy both filters.
`OR`
If the `OR` operator is used, the key-value must satisfy at least one of the filters.
Unary Operators
`SKIP`
For a particular row, if any of the key-values fail the filter condition, the entire row is skipped.
`WHILE`
For a particular row, key-values will be emitted until a key-value is reached that fails the filter condition.
Example 29\. Compound Operators
You can combine multiple operators to create a hierarchy of filters, such as the following example:
```
(Filter1 AND Filter2) OR (Filter3 AND Filter4)
```
### 103.3\. Order of Evaluation
1. Parentheses have the highest precedence.
2. The unary operators `SKIP` and `WHILE` are next, and have the same precedence.
3. The binary operators follow. `AND` has highest precedence, followed by `OR`.
Example 30\. Precedence Example
```
Filter1 AND Filter2 OR Filter
is evaluated as
(Filter1 AND Filter2) OR Filter3
```
```
Filter1 AND SKIP Filter2 OR Filter3
is evaluated as
(Filter1 AND (SKIP Filter2)) OR Filter3
```
You can use parentheses to explicitly control the order of evaluation.
### 103.4\. Compare Operator
The following compare operators are provided:
1. LESS (<)
2. LESS_OR_EQUAL (⇐)
3. EQUAL (=)
4. NOT_EQUAL (!=)
5. GREATER_OR_EQUAL (>=)
6. GREATER (>)
7. NO_OP (no operation)
The client should use the symbols (<, ⇐, =, !=, >, >=) to express compare operators.
### 103.5\. Comparator
A comparator can be any of the following:
1. _BinaryComparator_ - This lexicographically compares against the specified byte array using Bytes.compareTo(byte[], byte[])
2. _BinaryPrefixComparator_ - This lexicographically compares against a specified byte array. It only compares up to the length of this byte array.
3. _RegexStringComparator_ - This compares against the specified byte array using the given regular expression. Only EQUAL and NOT_EQUAL comparisons are valid with this comparator
4. _SubStringComparator_ - This tests if the given substring appears in a specified byte array. The comparison is case insensitive. Only EQUAL and NOT_EQUAL comparisons are valid with this comparator
The general syntax of a comparator is: `ComparatorType:ComparatorValue`
The ComparatorType for the various comparators is as follows:
1. _BinaryComparator_ - binary
2. _BinaryPrefixComparator_ - binaryprefix
3. _RegexStringComparator_ - regexstring
4. _SubStringComparator_ - substring
The ComparatorValue can be any value.
Example ComparatorValues
1. `binary:abc` will match everything that is lexicographically greater than "abc"
2. `binaryprefix:abc` will match everything whose first 3 characters are lexicographically equal to "abc"
3. `regexstring:ab*yz` will match everything that doesn’t begin with "ab" and ends with "yz"
4. `substring:abc123` will match everything that begins with the substring "abc123"
### 103.6\. Example PHP Client Program that uses the Filter Language
```
<?
$_SERVER['PHP_ROOT'] = realpath(dirname(__FILE__).'/..');
require_once $_SERVER['PHP_ROOT'].'/flib/__flib.php';
flib_init(FLIB_CONTEXT_SCRIPT);
require_module('storage/hbase');
$hbase = new HBase('<server_name_running_thrift_server>', <port on which thrift server is running>);
$hbase->open();
$client = $hbase->getClient();
$result = $client->scannerOpenWithFilterString('table_name', "(PrefixFilter ('row2') AND (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))");
$to_print = $client->scannerGetList($result,1);
while ($to_print) {
print_r($to_print);
$to_print = $client->scannerGetList($result,1);
}
$client->scannerClose($result);
?>
```
### 103.7\. Example Filter Strings
* `"PrefixFilter ('Row') AND PageFilter (1) AND FirstKeyOnlyFilter ()"` will return all key-value pairs that match the following conditions:
1. The row containing the key-value should have prefix _Row_
2. The key-value must be located in the first row of the table
3. The key-value pair must be the first key-value in the row
* `"(RowFilter (=, 'binary:Row 1') AND TimeStampsFilter (74689, 89734)) OR ColumnRangeFilter ('abc', true, 'xyz', false))"` will return all key-value pairs that match both the following conditions:
* The key-value is in a row having row key _Row 1_
* The key-value must have a timestamp of either 74689 or 89734.
* Or it must match the following condition:
* The key-value pair must be in a column that is lexicographically >= abc and < xyz 
* `"SKIP ValueFilter (0)"` will skip the entire row if any of the values in the row is not 0
### 103.8\. Individual Filter Syntax
KeyOnlyFilter
This filter doesn’t take any arguments. It returns only the key component of each key-value.
FirstKeyOnlyFilter
This filter doesn’t take any arguments. It returns only the first key-value from each row.
PrefixFilter
This filter takes one argument – a prefix of a row key. It returns only those key-values present in a row that starts with the specified row prefix
ColumnPrefixFilter
This filter takes one argument – a column prefix. It returns only those key-values present in a column that starts with the specified column prefix. The column prefix must be of the form: `“qualifier”`.
MultipleColumnPrefixFilter
This filter takes a list of column prefixes. It returns key-values that are present in a column that starts with any of the specified column prefixes. Each of the column prefixes must be of the form: `“qualifier”`.
ColumnCountGetFilter
This filter takes one argument – a limit. It returns the first limit number of columns in the table.
PageFilter
This filter takes one argument – a page size. It returns page size number of rows from the table.
ColumnPaginationFilter
This filter takes two arguments – a limit and offset. It returns limit number of columns after offset number of columns. It does this for all the rows.
InclusiveStopFilter
This filter takes one argument – a row key on which to stop scanning. It returns all key-values present in rows up to and including the specified row.
TimeStampsFilter
This filter takes a list of timestamps. It returns those key-values whose timestamps matches any of the specified timestamps.
RowFilter
This filter takes a compare operator and a comparator. It compares each row key with the comparator using the compare operator and if the comparison returns true, it returns all the key-values in that row.
Family Filter
This filter takes a compare operator and a comparator. It compares each column family name with the comparator using the compare operator and if the comparison returns true, it returns all the Cells in that column family.
QualifierFilter
This filter takes a compare operator and a comparator. It compares each qualifier name with the comparator using the compare operator and if the comparison returns true, it returns all the key-values in that column.
ValueFilter
This filter takes a compare operator and a comparator. It compares each value with the comparator using the compare operator and if the comparison returns true, it returns that key-value.
DependentColumnFilter
This filter takes two arguments – a family and a qualifier. It tries to locate this column in each row and returns all key-values in that row that have the same timestamp. If the row doesn’t contain the specified column – none of the key-values in that row will be returned.
SingleColumnValueFilter
This filter takes a column family, a qualifier, a compare operator and a comparator. If the specified column is not found – all the columns of that row will be emitted. If the column is found and the comparison with the comparator returns true, all the columns of the row will be emitted. If the condition fails, the row will not be emitted.
SingleColumnValueExcludeFilter
This filter takes the same arguments and behaves same as SingleColumnValueFilter – however, if the column is found and the condition passes, all the columns of the row will be emitted except for the tested column value.
ColumnRangeFilter
This filter is used for selecting only those keys with columns that are between minColumn and maxColumn. It also takes two boolean variables to indicate whether to include the minColumn and maxColumn or not.
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
# Apache HBase Case Studies
## 147\. Overview
This chapter will describe a variety of performance and troubleshooting case studies that can provide a useful blueprint on diagnosing Apache HBase cluster issues.
For more information on Performance and Troubleshooting, see [Apache HBase Performance Tuning](#performance) and [Troubleshooting and Debugging Apache HBase](#trouble).
## 148\. Schema Design
See the schema design case studies here: [Schema Design Case Studies](#schema.casestudies)
## 149\. Performance/Troubleshooting
### 149.1\. Case Study #1 (Performance Issue On A Single Node)
#### 149.1.1\. Scenario
Following a scheduled reboot, one data node began exhibiting unusual behavior. Routine MapReduce jobs run against HBase tables which regularly completed in five or six minutes began taking 30 or 40 minutes to finish. These jobs were consistently found to be waiting on map and reduce tasks assigned to the troubled data node (e.g., the slow map tasks all had the same Input Split). The situation came to a head during a distributed copy, when the copy was severely prolonged by the lagging node.
#### 149.1.2\. Hardware
Datanodes:
* Two 12-core processors
* Six Enterprise SATA disks
* 24GB of RAM
* Two bonded gigabit NICs
Network:
* 10 Gigabit top-of-rack switches
* 20 Gigabit bonded interconnects between racks.
#### 149.1.3\. Hypotheses
##### HBase "Hot Spot" Region
We hypothesized that we were experiencing a familiar point of pain: a "hot spot" region in an HBase table, where uneven key-space distribution can funnel a huge number of requests to a single HBase region, bombarding the RegionServer process and cause slow response time. Examination of the HBase Master status page showed that the number of HBase requests to the troubled node was almost zero. Further, examination of the HBase logs showed that there were no region splits, compactions, or other region transitions in progress. This effectively ruled out a "hot spot" as the root cause of the observed slowness.
##### HBase Region With Non-Local Data
Our next hypothesis was that one of the MapReduce tasks was requesting data from HBase that was not local to the DataNode, thus forcing HDFS to request data blocks from other servers over the network. Examination of the DataNode logs showed that there were very few blocks being requested over the network, indicating that the HBase region was correctly assigned, and that the majority of the necessary data was located on the node. This ruled out the possibility of non-local data causing a slowdown.
##### Excessive I/O Wait Due To Swapping Or An Over-Worked Or Failing Hard Disk
After concluding that the Hadoop and HBase were not likely to be the culprits, we moved on to troubleshooting the DataNode’s hardware. Java, by design, will periodically scan its entire memory space to do garbage collection. If system memory is heavily overcommitted, the Linux kernel may enter a vicious cycle, using up all of its resources swapping Java heap back and forth from disk to RAM as Java tries to run garbage collection. Further, a failing hard disk will often retry reads and/or writes many times before giving up and returning an error. This can manifest as high iowait, as running processes wait for reads and writes to complete. Finally, a disk nearing the upper edge of its performance envelope will begin to cause iowait as it informs the kernel that it cannot accept any more data, and the kernel queues incoming data into the dirty write pool in memory. However, using `vmstat(1)` and `free(1)`, we could see that no swap was being used, and the amount of disk IO was only a few kilobytes per second.
##### Slowness Due To High Processor Usage
Next, we checked to see whether the system was performing slowly simply due to very high computational load. `top(1)` showed that the system load was higher than normal, but `vmstat(1)` and `mpstat(1)` showed that the amount of processor being used for actual computation was low.
##### Network Saturation (The Winner)
Since neither the disks nor the processors were being utilized heavily, we moved on to the performance of the network interfaces. The DataNode had two gigabit ethernet adapters, bonded to form an active-standby interface. `ifconfig(8)` showed some unusual anomalies, namely interface errors, overruns, framing errors. While not unheard of, these kinds of errors are exceedingly rare on modern hardware which is operating as it should:
```
$ /sbin/ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:10.x.x.x Bcast:10.x.x.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:2990700159 errors:12 dropped:0 overruns:1 frame:6 <--- Look Here! Errors!
TX packets:3443518196 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2416328868676 (2.4 TB) TX bytes:3464991094001 (3.4 TB)
```
These errors immediately lead us to suspect that one or more of the ethernet interfaces might have negotiated the wrong line speed. This was confirmed both by running an ICMP ping from an external host and observing round-trip-time in excess of 700ms, and by running `ethtool(8)` on the members of the bond interface and discovering that the active interface was operating at 100Mbs/, full duplex.
```
$ sudo ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Link partner advertised link modes: Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Speed: 100Mb/s <--- Look Here! Should say 1000Mb/s!
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: umbg
Wake-on: g
Current message level: 0x00000003 (3)
Link detected: yes
```
In normal operation, the ICMP ping round trip time should be around 20ms, and the interface speed and duplex should read, "1000MB/s", and, "Full", respectively.
#### 149.1.4\. Resolution
After determining that the active ethernet adapter was at the incorrect speed, we used the `ifenslave(8)` command to make the standby interface the active interface, which yielded an immediate improvement in MapReduce performance, and a 10 times improvement in network throughput:
On the next trip to the datacenter, we determined that the line speed issue was ultimately caused by a bad network cable, which was replaced.
### 149.2\. Case Study #2 (Performance Research 2012)
Investigation results of a self-described "we’re not sure what’s wrong, but it seems slow" problem. [http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.html](http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.html)
### 149.3\. Case Study #3 (Performance Research 2010))
Investigation results of general cluster performance from 2010. Although this research is on an older version of the codebase, this writeup is still very useful in terms of approach. [http://hstack.org/hbase-performance-testing/](http://hstack.org/hbase-performance-testing/)
### 149.4\. Case Study #4 (max.transfer.threads Config)
Case study of configuring `max.transfer.threads` (previously known as `xcievers`) and diagnosing errors from misconfigurations. [http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html](http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html)
See also [`dfs.datanode.max.transfer.threads`](#dfs.datanode.max.transfer.threads) .
此差异已折叠。
此差异已折叠。
# Unit Testing HBase Applications
This chapter discusses unit testing your HBase application using JUnit, Mockito, MRUnit, and HBaseTestingUtility. Much of the information comes from [a community blog post about testing HBase applications](http://blog.cloudera.com/blog/2013/09/how-to-test-hbase-applications-using-popular-tools/). For information on unit tests for HBase itself, see [hbase.tests](#hbase.tests).
## 175\. JUnit
HBase uses [JUnit](http://junit.org) for unit tests
This example will add unit tests to the following example class:
```
public class MyHBaseDAO {
public static void insertRecord(Table.getTable(table), HBaseTestObj obj)
throws Exception {
Put put = createPut(obj);
table.put(put);
}
private static Put createPut(HBaseTestObj obj) {
Put put = new Put(Bytes.toBytes(obj.getRowKey()));
put.add(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1"),
Bytes.toBytes(obj.getData1()));
put.add(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2"),
Bytes.toBytes(obj.getData2()));
return put;
}
}
```
The first step is to add JUnit dependencies to your Maven POM file:
```
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
```
Next, add some unit tests to your code. Tests are annotated with `@Test`. Here, the unit tests are in bold.
```
public class TestMyHbaseDAOData {
@Test
public void testCreatePut() throws Exception {
HBaseTestObj obj = new HBaseTestObj();
obj.setRowKey("ROWKEY-1");
obj.setData1("DATA-1");
obj.setData2("DATA-2");
Put put = MyHBaseDAO.createPut(obj);
assertEquals(obj.getRowKey(), Bytes.toString(put.getRow()));
assertEquals(obj.getData1(), Bytes.toString(put.get(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1")).get(0).getValue()));
assertEquals(obj.getData2(), Bytes.toString(put.get(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2")).get(0).getValue()));
}
}
```
These tests ensure that your `createPut` method creates, populates, and returns a `Put` object with expected values. Of course, JUnit can do much more than this. For an introduction to JUnit, see [https://github.com/junit-team/junit/wiki/Getting-started](https://github.com/junit-team/junit/wiki/Getting-started).
## 176\. Mockito
Mockito is a mocking framework. It goes further than JUnit by allowing you to test the interactions between objects without having to replicate the entire environment. You can read more about Mockito at its project site, [https://code.google.com/p/mockito/](https://code.google.com/p/mockito/).
You can use Mockito to do unit testing on smaller units. For instance, you can mock a `org.apache.hadoop.hbase.Server` instance or a `org.apache.hadoop.hbase.master.MasterServices` interface reference rather than a full-blown `org.apache.hadoop.hbase.master.HMaster`.
This example builds upon the example code in [unit.tests](#unit.tests), to test the `insertRecord` method.
First, add a dependency for Mockito to your Maven POM file.
```
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>2.1.0</version>
<scope>test</scope>
</dependency>
```
Next, add a `@RunWith` annotation to your test class, to direct it to use Mockito.
```
@RunWith(MockitoJUnitRunner.class)
public class TestMyHBaseDAO{
@Mock
Configuration config = HBaseConfiguration.create();
@Mock
Connection connection = ConnectionFactory.createConnection(config);
@Mock
private Table table;
@Captor
private ArgumentCaptor putCaptor;
@Test
public void testInsertRecord() throws Exception {
//return mock table when getTable is called
when(connection.getTable(TableName.valueOf("tablename")).thenReturn(table);
//create test object and make a call to the DAO that needs testing
HBaseTestObj obj = new HBaseTestObj();
obj.setRowKey("ROWKEY-1");
obj.setData1("DATA-1");
obj.setData2("DATA-2");
MyHBaseDAO.insertRecord(table, obj);
verify(table).put(putCaptor.capture());
Put put = putCaptor.getValue();
assertEquals(Bytes.toString(put.getRow()), obj.getRowKey());
assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1")));
assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2")));
assertEquals(Bytes.toString(put.get(Bytes.toBytes("CF"),Bytes.toBytes("CQ-1")).get(0).getValue()), "DATA-1");
assertEquals(Bytes.toString(put.get(Bytes.toBytes("CF"),Bytes.toBytes("CQ-2")).get(0).getValue()), "DATA-2");
}
}
```
This code populates `HBaseTestObj` with `ROWKEY-1'',` DATA-1'', ``DATA-2'' as values. It then inserts the record into the mocked table. The Put that the DAO would have inserted is captured, and values are tested to verify that they are what you expected them to be.
The key here is to manage Connection and Table instance creation outside the DAO. This allows you to mock them cleanly and test Puts as shown above. Similarly, you can now expand into other operations such as Get, Scan, or Delete.
## 177\. MRUnit
[Apache MRUnit](https://mrunit.apache.org/) is a library that allows you to unit-test MapReduce jobs. You can use it to test HBase jobs in the same way as other MapReduce jobs.
Given a MapReduce job that writes to an HBase table called `MyTest`, which has one column family called `CF`, the reducer of such a job could look like the following:
```
public class MyReducer extends TableReducer<Text, Text, ImmutableBytesWritable> {
public static final byte[] CF = "CF".getBytes();
public static final byte[] QUALIFIER = "CQ-1".getBytes();
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
//bunch of processing to extract data to be inserted, in our case, let's say we are simply
//appending all the records we receive from the mapper for this particular
//key and insert one record into HBase
StringBuffer data = new StringBuffer();
Put put = new Put(Bytes.toBytes(key.toString()));
for (Text val : values) {
data = data.append(val);
}
put.add(CF, QUALIFIER, Bytes.toBytes(data.toString()));
//write to HBase
context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())), put);
}
}
```
To test this code, the first step is to add a dependency to MRUnit to your Maven POM file.
```
<dependency>
<groupId>org.apache.mrunit</groupId>
<artifactId>mrunit</artifactId>
<version>1.0.0 </version>
<scope>test</scope>
</dependency>
```
Next, use the ReducerDriver provided by MRUnit, in your Reducer job.
```
public class MyReducerTest {
ReduceDriver<Text, Text, ImmutableBytesWritable, Writable> reduceDriver;
byte[] CF = "CF".getBytes();
byte[] QUALIFIER = "CQ-1".getBytes();
@Before
public void setUp() {
MyReducer reducer = new MyReducer();
reduceDriver = ReduceDriver.newReduceDriver(reducer);
}
@Test
public void testHBaseInsert() throws IOException {
String strKey = "RowKey-1", strValue = "DATA", strValue1 = "DATA1",
strValue2 = "DATA2";
List<Text> list = new ArrayList<Text>();
list.add(new Text(strValue));
list.add(new Text(strValue1));
list.add(new Text(strValue2));
//since in our case all that the reducer is doing is appending the records that the mapper
//sends it, we should get the following back
String expectedOutput = strValue + strValue1 + strValue2;
//Setup Input, mimic what mapper would have passed
//to the reducer and run test
reduceDriver.withInput(new Text(strKey), list);
//run the reducer and get its output
List<Pair<ImmutableBytesWritable, Writable>> result = reduceDriver.run();
//extract key from result and verify
assertEquals(Bytes.toString(result.get(0).getFirst().get()), strKey);
//extract value for CF/QUALIFIER and verify
Put a = (Put)result.get(0).getSecond();
String c = Bytes.toString(a.get(CF, QUALIFIER).get(0).getValue());
assertEquals(expectedOutput,c );
}
}
```
Your MRUnit test verifies that the output is as expected, the Put that is inserted into HBase has the correct value, and the ColumnFamily and ColumnQualifier have the correct values.
MRUnit includes a MapperDriver to test mapping jobs, and you can use MRUnit to test other operations, including reading from HBase, processing data, or writing to HDFS,
## 178\. Integration Testing with an HBase Mini-Cluster
HBase ships with HBaseTestingUtility, which makes it easy to write integration tests using a _mini-cluster_. The first step is to add some dependencies to your Maven POM file. Check the versions to be sure they are appropriate.
```
<properties>
<hbase.version>2.0.0-SNAPSHOT</hbase.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-testing-util</artifactId>
<version>${hbase.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
```
This code represents an integration test for the MyDAO insert shown in [unit.tests](#unit.tests).
```
public class MyHBaseIntegrationTest {
private static HBaseTestingUtility utility;
byte[] CF = "CF".getBytes();
byte[] CQ1 = "CQ-1".getBytes();
byte[] CQ2 = "CQ-2".getBytes();
@Before
public void setup() throws Exception {
utility = new HBaseTestingUtility();
utility.startMiniCluster();
}
@Test
public void testInsert() throws Exception {
Table table = utility.createTable(Bytes.toBytes("MyTest"), CF);
HBaseTestObj obj = new HBaseTestObj();
obj.setRowKey("ROWKEY-1");
obj.setData1("DATA-1");
obj.setData2("DATA-2");
MyHBaseDAO.insertRecord(table, obj);
Get get1 = new Get(Bytes.toBytes(obj.getRowKey()));
get1.addColumn(CF, CQ1);
Result result1 = table.get(get1);
assertEquals(Bytes.toString(result1.getRow()), obj.getRowKey());
assertEquals(Bytes.toString(result1.value()), obj.getData1());
Get get2 = new Get(Bytes.toBytes(obj.getRowKey()));
get2.addColumn(CF, CQ2);
Result result2 = table.get(get2);
assertEquals(Bytes.toString(result2.getRow()), obj.getRowKey());
assertEquals(Bytes.toString(result2.value()), obj.getData2());
}
}
```
This code creates an HBase mini-cluster and starts it. Next, it creates a table called `MyTest` with one column family, `CF`. A record is inserted, a Get is performed from the same table, and the insertion is verified.
> Starting the mini-cluster takes about 20-30 seconds, but that should be appropriate for integration testing.
See the paper at [HBase Case-Study: Using HBaseTestingUtility for Local Testing and Development](http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/) (2010) for more information about HBaseTestingUtility.
# Protobuf in HBase
## 179\. Protobuf
HBase uses Google’s [protobufs](https://developers.google.com/protocol-buffers/) wherever it persists metadata — in the tail of hfiles or Cells written by HBase into the system hbase:meta table or when HBase writes znodes to zookeeper, etc. — and when it passes objects over the wire making [RPCs](#hbase.rpc). HBase uses protobufs to describe the RPC Interfaces (Services) we expose to clients, for example the `Admin` and `Client` Interfaces that the RegionServer fields, or specifying the arbitrary extensions added by developers via our [Coprocessor Endpoint](#cp) mechanism.
In this chapter we go into detail for developers who are looking to understand better how it all works. This chapter is of particular use to those who would amend or extend HBase functionality.
With protobuf, you describe serializations and services in a `.protos` file. You then feed these descriptors to a protobuf tool, the `protoc` binary, to generate classes that can marshall and unmarshall the described serializations and field the specified Services.
See the `README.txt` in the HBase sub-modules for details on how to run the class generation on a per-module basis; e.g. see `hbase-protocol/README.txt` for how to generate protobuf classes in the hbase-protocol module.
In HBase, `.proto` files are either in the `hbase-protocol` module; a module dedicated to hosting the common proto files and the protoc generated classes that HBase uses internally serializing metadata. For extensions to hbase such as REST or Coprocessor Endpoints that need their own descriptors; their protos are located inside the function’s hosting module: e.g. `hbase-rest` is home to the REST proto files and the `hbase-rsgroup` table grouping Coprocessor Endpoint has all protos that have to do with table grouping.
Protos are hosted by the module that makes use of them. While this makes it so generation of protobuf classes is distributed, done per module, we do it this way so modules encapsulate all to do with the functionality they bring to hbase.
Extensions whether REST or Coprocessor Endpoints will make use of core HBase protos found back in the hbase-protocol module. They’ll use these core protos when they want to serialize a Cell or a Put or refer to a particular node via ServerName, etc., as part of providing the CPEP Service. Going forward, after the release of hbase-2.0.0, this practice needs to whither. We’ll explain why in the later [hbase-2.0.0](#shaded.protobuf) section.
### 179.1\. hbase-2.0.0 and the shading of protobufs (HBASE-15638)
As of hbase-2.0.0, our protobuf usage gets a little more involved. HBase core protobuf references are offset so as to refer to a private, bundled protobuf. Core stops referring to protobuf classes at com.google.protobuf.* and instead references protobuf at the HBase-specific offset org.apache.hadoop.hbase.shaded.com.google.protobuf.*. We do this indirection so hbase core can evolve its protobuf version independent of whatever our dependencies rely on. For instance, HDFS serializes using protobuf. HDFS is on our CLASSPATH. Without the above described indirection, our protobuf versions would have to align. HBase would be stuck on the HDFS protobuf version until HDFS decided to upgrade. HBase and HDFS versions would be tied.
We had to move on from protobuf-2.5.0 because we need facilities added in protobuf-3.1.0; in particular being able to save on copies and avoiding bringing protobufs onheap for serialization/deserialization.
In hbase-2.0.0, we introduced a new module, `hbase-protocol-shaded` inside which we contained all to do with protobuf and its subsequent relocation/shading. This module is in essence a copy of much of the old `hbase-protocol` but with an extra shading/relocation step. Core was moved to depend on this new module.
That said, a complication arises around Coprocessor Endpoints (CPEPs). CPEPs depend on public HBase APIs that reference protobuf classes at `com.google.protobuf.*` explicitly. For example, in our Table Interface we have the below as the means by which you obtain a CPEP Service to make invocations against:
```
...
<T extends com.google.protobuf.Service,R> Map<byte[],R> coprocessorService(
Class<T> service, byte[] startKey, byte[] endKey,
org.apache.hadoop.hbase.client.coprocessor.Batch.Call<T,R> callable)
throws com.google.protobuf.ServiceException, Throwable
```
Existing CPEPs will have made reference to core HBase protobufs specifying ServerNames or carrying Mutations. So as to continue being able to service CPEPs and their references to `com.google.protobuf.` **across the upgrade to hbase-2.0.0 and beyond, HBase needs to be able to deal with both `com.google.protobuf.`** references and its internal offset `org.apache.hadoop.hbase.shaded.com.google.protobuf.*` protobufs.
The `hbase-protocol-shaded` module hosts all protobufs used by HBase core.
But for the vestigial CPEP references to the (non-shaded) content of `hbase-protocol`, we keep around most of this module going forward just so it is available to CPEPs. Retaining the most of `hbase-protocol` makes for overlapping, 'duplicated' proto instances where some exist as non-shaded/non-relocated here in their old module location but also in the new location, shaded under `hbase-protocol-shaded`. In other words, there is an instance of the generated protobuf class `org.apache.hadoop.hbase.protobuf.generated.ServerName` in hbase-protocol and another generated instance that is the same in all regards except its protobuf references are to the internal shaded version at `org.apache.hadoop.hbase.shaded.protobuf.generated.ServerName` (note the 'shaded' addition in the middle of the package name).
If you extend a proto in `hbase-protocol-shaded` for internal use, consider extending it also in `hbase-protocol` (and regenerating).
Going forward, we will provide a new module of common types for use by CPEPs that will have the same guarantees against change as does our public API. TODO.
此差异已折叠。
此差异已折叠。
此差异已折叠。
# Community
## 198\. Decisions
Feature Branches
Feature Branches are easy to make. You do not have to be a committer to make one. Just request the name of your branch be added to JIRA up on the developer’s mailing list and a committer will add it for you. Thereafter you can file issues against your feature branch in Apache HBase JIRA. Your code you keep elsewhere — it should be public so it can be observed — and you can update dev mailing list on progress. When the feature is ready for commit, 3 +1s from committers will get your feature merged. See [HBase, mail # dev - Thoughts about large feature dev branches](http://search-hadoop.com/m/asM982C5FkS1)
How to set fix version in JIRA on issue resolve
Here is how [we agreed](http://search-hadoop.com/m/azemIi5RCJ1) to set versions in JIRA when we resolve an issue. If master is going to be 2.0.0, and branch-1 1.4.0 then:
* Commit only to master: Mark with 2.0.0
* Commit to branch-1 and master: Mark with 2.0.0, and 1.4.0
* Commit to branch-1.3, branch-1, and master: Mark with 2.0.0, 1.4.0, and 1.3.x
* Commit site fixes: no version
Policy on when to set a RESOLVED JIRA as CLOSED
We [agreed](http://search-hadoop.com/m/4cIKs1iwXMS1) that for issues that list multiple releases in their _Fix Version/s_ field, CLOSE the issue on the release of any of the versions listed; subsequent change to the issue must happen in a new JIRA.
Only transient state in ZooKeeper!
You should be able to kill the data in zookeeper and hbase should ride over it recreating the zk content as it goes. This is an old adage around these parts. We just made note of it now. We also are currently in violation of this basic tenet — replication at least keeps permanent state in zk — but we are working to undo this breaking of a golden rule.
## 199\. Community Roles
### 199.1\. Release Managers
Each maintained release branch has a release manager, who volunteers to coordinate new features and bug fixes are backported to that release. The release managers are [committers](https://hbase.apache.org/team-list.html). If you would like your feature or bug fix to be included in a given release, communicate with that release manager. If this list goes out of date or you can’t reach the listed person, reach out to someone else on the list.
> End-of-life releases are not included in this list.
| Release | Release Manager |
| --- | --- |
| 1.2 | Sean Busbey |
| 1.3 | Mikhail Antonov |
| 1.4 | Andrew Purtell |
| 2.0 | Michael Stack |
| 2.1 | Duo Zhang |
## 200\. Commit Message format
We [agreed](http://search-hadoop.com/m/Gwxwl10cFHa1) to the following Git commit message format:
```
HBASE-xxxxx <title>. (<contributor>)
```
If the person making the commit is the contributor, leave off the '(<contributor>)' element.
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册