提交 43bc7904 编写于 作者: I Ivan Kelly 提交者: 冉小龙

[TIEREDSTORAGE] Only seek when reading unexpected entry (#5356)

* [TIEREDSTORAGE] Only seek when reading unexpected entry

The normal pattern from reading from an offloaded ledger is that the
reader will read the ledger sequentially from start to end. This means
that once a user reads an entry, we should expect that the next entry
they read will be the next entry in the ledger.

The initial implementation of the BlobStoreBackedReadHandleImpl (and
the S3 variant that preceeded it) didn't take this into
account. Instead it did a lookup in the index each time, to find the
block that contained the entry, and then read forward in the block
until it found the entry requested. This is fine for the first few
entries in the block, not so much for the last.

This PR changes the read behaviour to only seek if entryId read
from the block is either:
- greater than the entry we were expecting to read, in which case we
  need to seek backwards in the block.
- less than the entry expected, but also belonging to a different
  block to the expected entry, in which case we need to seek to the
  correct block.

This change improves read performance significantly. Adhoc benchmarks
shows that we can read from offloaded topics at ~160MB/s whereas
previously we could only manage <10MB/s.

* Revert it back to debug
上级 ebaf97cc
......@@ -116,7 +116,7 @@ public class BlobStoreBackedInputStreamImpl extends BackedInputStream {
@Override
public void seek(long position) {
log.debug("Seeking to {} on {}/{}, current position {}", position, bucket, key, cursor);
log.debug("Seeking to {} on {}/{}, current position {} (bufStart:{}, bufEnd:{})", position, bucket, key, cursor, bufferOffsetStart, bufferOffsetEnd);
if (position >= bufferOffsetStart && position <= bufferOffsetEnd) {
long newIndex = position - bufferOffsetStart;
buffer.readerIndex((int)newIndex);
......
......@@ -106,14 +106,11 @@ public class BlobStoreBackedReadHandleImpl implements ReadHandle {
List<LedgerEntry> entries = new ArrayList<LedgerEntry>();
long nextExpectedId = firstEntry;
try {
OffloadIndexEntry entry = index.getIndexEntryForEntry(firstEntry);
inputStream.seek(entry.getDataOffset());
while (entriesToRead > 0) {
int length = dataStream.readInt();
if (length < 0) { // hit padding or new block
inputStream.seekForward(index.getIndexEntryForEntry(nextExpectedId).getDataOffset());
length = dataStream.readInt();
inputStream.seek(index.getIndexEntryForEntry(nextExpectedId).getDataOffset());
continue;
}
long entryId = dataStream.readLong();
......@@ -126,6 +123,14 @@ public class BlobStoreBackedReadHandleImpl implements ReadHandle {
}
entriesToRead--;
nextExpectedId++;
} else if (entryId > nextExpectedId) {
inputStream.seek(index.getIndexEntryForEntry(nextExpectedId).getDataOffset());
continue;
} else if (entryId < nextExpectedId
&& !index.getIndexEntryForEntry(nextExpectedId).equals(
index.getIndexEntryForEntry(entryId))) {
inputStream.seek(index.getIndexEntryForEntry(nextExpectedId).getDataOffset());
continue;
} else if (entryId > lastEntry) {
log.info("Expected to read {}, but read {}, which is greater than last entry {}",
nextExpectedId, entryId, lastEntry);
......
......@@ -123,5 +123,11 @@ public class DataBlockHeaderImpl implements DataBlockHeader {
// true means the input stream will release the ByteBuf on close
return new ByteBufInputStream(out, true);
}
@Override
public String toString() {
return String.format("DataBlockHeader(len:%d,hlen:%d,firstEntry:%d)",
blockLength, headerLength, firstEntryId);
}
}
......@@ -58,5 +58,11 @@ public class OffloadIndexEntryImpl implements OffloadIndexEntry {
this.offset = offset;
this.blockHeaderSize = blockHeaderSize;
}
@Override
public String toString() {
return String.format("[eid:%d, part:%d, offset:%d, doffset:%d]",
entryId, partId, offset, getDataOffset());
}
}
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册