Presto JVM性能调优GCLocker垃圾回收-Devlive 开源社区

Presto集群中某节点丢失上报心跳状态(服务未宕机),查看该节点日志发现以上gc清理日志.

[510515.508s][warning][gc,alloc] http-worker-159944: Retried waiting for GCLocker too often allocating 11648 words
[510515.508s][warning][gc,alloc] page-buffer-client-callback-8474: Retried waiting for GCLocker too often allocating 19073 words
[510515.508s][warning][gc,alloc] http-worker-153295: Retried waiting for GCLocker too often allocating 14450 words
[510515.508s][warning][gc,alloc] http-worker-160657: Retried waiting for GCLocker too often allocating 15585 words
[510515.508s][warning][gc,alloc] http-worker-159719: Retried waiting for GCLocker too often allocating 20110 words
[510515.508s][warning][gc,alloc] http-worker-161471: Retried waiting for GCLocker too often allocating 13867 words
[510515.508s][warning][gc,alloc] http-worker-159190: Retried waiting for GCLocker too often allocating 7651 words
[510561.503s][warning][gc,alloc] task-processor-82635: Retried waiting for GCLocker too often allocating 18700 words
[510561.503s][warning][gc,alloc] task-processor-81526: Retried waiting for GCLocker too often allocating 18910 words
[510561.504s][warning][gc,alloc] task-processor-81315: Retried waiting for GCLocker too often allocating 15505 words
[510561.504s][warning][gc,alloc] task-processor-81319: Retried waiting for GCLocker too often allocating 15505 words

通过日志和最近的查询历史来看有大对象分配逻辑导致清理gc变慢,gc1的大文件清理和分配逻辑如下详细文件GitHub:

for (uint try_count = 1, gclocker_retry_count = 0; /* we'll return */; try_count += 1) {
    bool should_try_gc;
    uint gc_count_before;

    {
      MutexLockerEx x(Heap_lock);

      result = humongous_obj_allocate(word_size);
      if (result != NULL) {
        size_t size_in_regions = humongous_obj_size_in_regions(word_size);
        g1_policy()->add_bytes_allocated_in_old_since_last_gc(size_in_regions * HeapRegion::GrainBytes);
        return result;
      }

      should_try_gc = !GCLocker::needs_gc();
      gc_count_before = total_collections();
    }

    if (should_try_gc) {
      bool succeeded;
      result = do_collection_pause(word_size, gc_count_before, &succeeded,
                                   GCCause::_g1_humongous_allocation);
      if (result != NULL) {
        assert(succeeded, "only way to get back a non-NULL result");
        log_trace(gc, alloc)("%s: Successfully scheduled collection returning " PTR_FORMAT,
                             Thread::current()->name(), p2i(result));
        return result;
      }

      if (succeeded) {
        log_trace(gc, alloc)("%s: Successfully scheduled collection failing to allocate "
                             SIZE_FORMAT " words", Thread::current()->name(), word_size);
        return NULL;
      }
      log_trace(gc, alloc)("%s: Unsuccessfully scheduled collection allocating " SIZE_FORMAT "",
                           Thread::current()->name(), word_size);
    } else {
      // Failed to schedule a collection.
      if (gclocker_retry_count > GCLockerRetryAllocationCount) {
        log_warning(gc, alloc)("%s: Retried waiting for GCLocker too often allocating "
                               SIZE_FORMAT " words", Thread::current()->name(), word_size);
        return NULL;
      }

      GCLocker::stall_until_clear();
      gclocker_retry_count += 1;
    }
  }

代码if (gclocker_retry_count > GCLockerRetryAllocationCount)标记了gc清理异常信息

需要注意的是GCLockerRetryAllocationCount的默认值是2,表示当分配中的垃圾回收次数超过这个阈值之后则直接失败。

为了保证gc的正常清理,需要调整此参数,根据cpu的核数调整至100(或者cpu的核数等值)

调整方式: 修改jvm增加以下配置

-XX:+UnlockDiagnosticVMOptions
-XX:GCLockerRetryAllocationCount=100

-XX:+UnlockDiagnosticVMOptions一定要添加否则GCLockerRetryAllocationCount将不会生效

目录CONTENT

Presto JVM性能调优GCLocker垃圾回收

评论区