1. 01 12月, 2021 7 次提交
  2. 30 11月, 2021 1 次提交
    • N
      perf(rollup): use NSplit API from sroar to improve rollup performance (#8092) · da9655b7
      Naman Jain 提交于
      This PR improves the performance of the rollups. Also, it fixes memory issues of the bulk loader.
      
      - Use the optimized Split API from the Sroar to split the bitmap (while doing rollup). This is an improvement over the recursive binary split that was very slow.
      - In bulk loader, use HandoverSkipList API instead of WriteBatch to avoid the memory constraints of doing a huge batch write.
      - Also, it introduces BitForbidPosting which limits the capability to store large posting lists that generate splits of length greater than the --limit max-splits flag. Once a posting list is marked as Forbidden it cannot be recovered.
      da9655b7
  3. 29 11月, 2021 1 次提交
  4. 27 11月, 2021 1 次提交
  5. 24 11月, 2021 1 次提交
  6. 16 11月, 2021 1 次提交
  7. 09 11月, 2021 1 次提交
    • N
      fix(race): fix multiple race conditions (#8069) · 5429202e
      Naman Jain 提交于
      Fixes 2 race conditions:
      
      - Txn's MaxAssignedSeen and AppliedIndexSeen were accessed without locks, this leads to a read-write race.
      - Rollups are not supposed to modify the List.plist. But it was modifying the plist by assigning it in out.plist that further modifies it by running splits.
      5429202e
  8. 30 10月, 2021 1 次提交
  9. 19 10月, 2021 1 次提交
    • D
      fix(sort): Only filter out nodes with positive offsets. (#8077) · 74d833ce
      Daniel Mai 提交于
      Negative offsets (e.g., offset: -4) can cause panics when sorting. This can happen when the query has the following characteristics:
      
      1. The query is sorting on an indexed predicate
      2. The results include nodes that also don't have the sorted predicate
      3. A negative offset is used.
      
          (panic trace is from v20.11.2-rc1-23-gaf5030a5)
          panic: runtime error: slice bounds out of range [-4:]
          goroutine 1762633 [running]:
          github.com/dgraph-io/dgraph/worker.sortWithIndex(0x1fb12e0, 0xc00906a880, 0xc009068660, 0x0)
                  /ext-go/1/src/github.com/dgraph-io/dgraph/worker/sort.go:330 +0x244d
          github.com/dgraph-io/dgraph/worker.processSort.func2(0x1fb12e0, 0xc00906a880, 0xc009068660, 0xc0090686c0)
                  /ext-go/1/src/github.com/dgraph-io/dgraph/worker/sort.go:515 +0x3f
          created by github.com/dgraph-io/dgraph/worker.processSort
                  /ext-go/1/src/github.com/dgraph-io/dgraph/worker/sort.go:514 +0x52a
      74d833ce
  10. 15 10月, 2021 1 次提交
  11. 09 10月, 2021 1 次提交
  12. 06 10月, 2021 1 次提交
  13. 05 10月, 2021 1 次提交
  14. 04 10月, 2021 1 次提交
  15. 03 10月, 2021 1 次提交
  16. 01 10月, 2021 3 次提交
    • N
      fix(split): enable split of posting list with single plist (#8062) · 3592353c
      Naman Jain 提交于
      When there was a single plist to start from and there was nothing in mutationMap, the encode does not happen and hence the splits were not getting generated.
      This PR fixes that issue.
      3592353c
    • A
      fix(restore): Do not retry restore proposal (#8058) · 69b186a0
      Ahsan Barkati 提交于
      Do not retry the restore proposal. It can cause issues in the edge case scenarios.
      Consider the following scenario:
      
      1. alpha-2 gets the restore request (leader is alpha-0)
      2. alpha-2 sends the request to alpha-0 (leader).
      3. alpha-0 called proposeAndWait which proposed the req (index 24) at time=15:56:10
      4. alpha-0 was still waiting for the proposal to be applied and RPC call for `Restore` by alpha-2 got "transport closing error" at time=15:59:08
      5. transport closing is a retriable error, so alpha-2 again tried to proposeoOrSend, this time leader was alpha-1, so it sent it to alpha-1
      6. alpha-1 proposed the restore request (index 28) at time=15:59:09
      69b186a0
    • A
      fix(txn): Fix data races in transaction code (#8060) · cf22bf7d
      Ahsan Barkati 提交于
      Fix data races.
      cf22bf7d
  17. 28 9月, 2021 1 次提交
  18. 24 9月, 2021 2 次提交
  19. 23 9月, 2021 2 次提交
  20. 22 9月, 2021 1 次提交
  21. 21 9月, 2021 2 次提交
  22. 20 9月, 2021 2 次提交
    • N
      fix(lambda): make lambda active only after successful start (#8036) · eaee2db2
      Naman Jain 提交于
      There is a logical race condition that causes panic. This happens because the node process did not complete its initialization before being killed. This PR fixes that issue.
      
      This PR also increases the default restart timeout from 10s to 30s to make it more generous.
      eaee2db2
    • N
      fix(probe): do not contend for lock in lazy load (#8037) · 5ad40d84
      Naman Jain 提交于
      Earlier the admin server mutex lock was used to protect the graphql schema map. But now we store that in schema store that internally handles the concurrency. Hence, we don't need to take the admin server's read lock to access schema.
      
      /probe/graphql is used as health check and is called very frequently. This rlock on adminserver mutex makes the /probe/graphql requests block while lazy loading when restore operation gets triggered at the startup. That leads to so many go routines being spun up.
      5ad40d84
  23. 18 9月, 2021 1 次提交
    • M
      opt(Restore): Make restore map phase faster (#8038) · 00600944
      Manish R Jain 提交于
      With this change, we can get ~450 MBps output throughput for map phase sustained. This is what we got on a 48 core AWS machine.
      
      ```
      alpha1    | I0917 08:03:27.921912      17 restore_map.go:547] Restore MAP 01h04m30s len(reqCh): 0 len(writeCh): 0 read: 472 GiB. output: 1.6 TiB. rate: 437 MiB/sec. nextFileId: 2474 writers: 0 jemalloc: 0 B.
      alpha1    | I0917 08:03:27.921934      17 restore_map.go:559] Restore MAP Done in 01h04m30s.
      ```
      
      Changes:
      * Make numGo equal to number of cores
      * Don't throttle.
      * Create a bigger buffer before calling merge
      * Rewrite mapper to remove sendForWriting out of the critical path.
      * Fix up a deadlock
      * Fix up the mapper
      * add write ch
      * Use many goroutines to create map files
      * Use fewer goroutines
      * Use 75% of the cores for mapping
      * Stagger the writes
      * Half writers can write at a time
      * Print num writers
      * Range for file would be 1/4 to 1
      * Reduce number of pending requests
      Co-authored-by: NAhsan Barkati <ahsanbarkati@gmail.com>
      00600944
  24. 17 9月, 2021 1 次提交
    • D
      fix(contrib): Quote strings in backup script. (#8035) · f6e8779a
      Daniel Mai 提交于
      Quote the bash strings in the backup script. This is important
      especially for the command-line arguments that can have shell special
      characters like passwords with special characters. For example, passing
      in a password as 'abc!123$456' should not get interpreted with the ! and
      $ characters from bash.
      f6e8779a
  25. 16 9月, 2021 4 次提交
    • A
      feat(magicNumber): Introduce magic number (#8032) · 59f6e7a7
      Ahsan Barkati 提交于
      Magic number is a unique identifier for the data format of dgraph. In 21.09 we
      have changed the data format of posting lists by bringing in sroar. Running
      Dgraph with sroar change on older p directory can cause data corruption.
      This magic number prevents DB to start up if the version is not compatible.
      59f6e7a7
    • N
      fix(lambda): shutdown node processes when alpha gets killed (#8027) · d3285b84
      Naman Jain 提交于
      We were already handling the graceful shutdown of node processes when alpha shuts down. We were passing the closer.Ctx() so that node process shuts down when alpha stops. But that is not sufficient when alpha panics.
      In case alpha panics, the alpha gets stuck because the node processes. This happens when sentry is also enabled. Sentry waits for all processes (child as well as grandchild to complete).
      d3285b84
    • A
      upgrade(sroar): Use latest sroar (#8028) · 2977e5f9
      Ahsan Barkati 提交于
      Use the latest sroar which has the fix for the AndNot bug.
      2977e5f9
    • A
      opt(sroar): Optimise the usage of sroar (#8022) · a22d7bd5
      Ahsan Barkati 提交于
      Bring in latest sroar and optimize its usage in Dgraph.
      a22d7bd5