[JAVA] logback gz 압축 방법

최근 회사에서 위치 정보를 보관하기 위한 모듈을 만들고 있습니다.

사실 제 서비스에서만 돌아가도록 구현을 해도 상관 없지만, 팀 내에 다른 서비스에도 적용해야 하기 때문에 이왕이면 모듈화를 시켜서 팀의 생산성을 높이고 싶어서 모듈을 만들었습니다.

이 과정에서 logback 내부를 뜯어보면서 많은 점을 배웠습니다. 이 점을 공유하려고 해당 글을 작성하게 되었습니다.

RollingPolicy

위치 정보를 파일로 남기는 방식으로 진행이 되게 됩니다.

그러기 위해서는 logback의 FileAppender를 이용하고 rolling policy를 이용하면 편합니다.

spring 개발자라면 TimeBased나 SizeAndTimeBased를 자주 이용하실 것입니다.

하루가 지나면, 이 로그를 gz로 압축하는 등의 작업을 통해 디스크를 절약할 수 있습니다.

의문

gz로 압축을 하려면 파일을 읽고 써야 합니다.

여기서 문제는 파일의 사이즈가 너무 클 경우에 OOM 같은 치명적인 문제가 발생할 수도 있지 않을까 하는 의문이 생겼습니다.

그래서 logback은 어떻게 압출을 해서 이런 문제를 피하고 있는지 궁금해서 코드를 분석해보았습니다.

Logback gz 압축 코드

우선 코드부터 보겠습니다.

/*
 * Logback: the reliable, generic, fast and flexible logging framework.
 * Copyright (C) 1999-2025, QOS.ch. All rights reserved.
 *
 * This program and the accompanying materials are dual-licensed under
 * either the terms of the Eclipse Public License v1.0 as published by
 * the Eclipse Foundation
 *
 *   or (per the licensee's choosing)
 *
 * under the terms of the GNU Lesser General Public License version 2.1
 * as published by the Free Software Foundation.
 */

package ch.qos.logback.core.rolling.helper;

import ch.qos.logback.core.status.ErrorStatus;
import ch.qos.logback.core.status.WarnStatus;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.zip.GZIPOutputStream;

public class GZCompressionStrategy extends CompressionStrategyBase {


    @Override
    public void compress(String originalFileName, String compressedFileName, String innerEntryName) {

        File file2gz = new File(originalFileName);

        if (!file2gz.exists()) {
            addStatus(new WarnStatus("The file to compress named [" + originalFileName + "] does not exist.", this));

            return;
        }

        if (!compressedFileName.endsWith(".gz")) {
            compressedFileName = compressedFileName + ".gz";
        }

        File gzedFile = new File(compressedFileName);

        if (gzedFile.exists()) {
            addWarn("The target compressed file named [" + compressedFileName + "] exist already. Aborting file compression.");
            return;
        }

        addInfo("GZ compressing [" + file2gz + "] as [" + gzedFile + "]");
        createMissingTargetDirsIfNecessary(gzedFile);

        try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream(originalFileName));
                        GZIPOutputStream gzos = new GZIPOutputStream(new FileOutputStream(compressedFileName))) {

            byte[] inbuf = new byte[BUFFER_SIZE];
            int n;

            while ((n = bis.read(inbuf)) != -1) {
                gzos.write(inbuf, 0, n);
            }

            addInfo("Done GZ compressing [" + file2gz + "] as [" + gzedFile + "]");
        } catch (Exception e) {
            addStatus(new ErrorStatus("Error occurred while compressing [" + originalFileName + "] into [" + compressedFileName + "].", this, e));
        }

        if (!file2gz.delete()) {
            addStatus(new WarnStatus("Could not delete [" + originalFileName + "].", this));
        }
    }

}

앞에서 예외를 처리하기 위해서 파일이 존재하는 지 검사하는 코드가 있습니다.

유심히 봐야할 점은 BufferedInputStream과 GZIPOutputStream 입니다.

또한 BUFFER_SIZE는 8192로 static final 변수에 default로 할당되어 있습니다.

BufferedInputStream

이 클래스는 자주 봤었지만, 정확히 무슨 역할을 하는 지는 몰랐었습니다.

이름만 보면은 파일이랑 jvm 사이에 buffer를 두고 데이터를 일정 크기만큼 한 번에 가져오고 읽는 것으로 보입니다.

실제로 조사해보니 이 내용이 맞습니다.

그리고 내부 구조를 보면,

public class BufferedInputStream extends FilterInputStream {

    private static final int DEFAULT_BUFFER_SIZE = 8192;

    private static final byte[] EMPTY = new byte[0];
    
    
// ......


}

로 되어 있고, 8192 byte씩 데이터를 읽습니다.

그렇기 때문에, 큰 사이즈의 데이터를 한 번에 읽지 않습니다.

그렇다면 읽은 데이터를 어떻게 압축하여 파일로 저장하는 지도 중요해보입니다.

읽을 때, 한 번에 읽지 않더라도 이를 메모리에 쌓아서 처리하게 되면 결국 OOM이 발생할 수 있기 때문입니다.

GZIPOutputStream

logback의 gz 로 압축하는 부분을 보면, GZIPOutputStream을 이용해서 파일에 쓰는 것을 볼 수 있습니다.

전체적인 흐름은

Deflater를 통해 압축한 데이터 압축 -> DefaultOutputStream의 buf에 데이터 write -> GZIPOutputStream 를 이용해 체크섬 검사 등 gzip 작업 수행.

순으로 이루어져 있습니다.

Deflater의 경우에는 native c로 이루어져 있어서, 코드를 까보지는 못했습니다. 다만, chatgpt에게 물었을 때 이 내부에도 따로 버퍼를 가지고 있다고 하고, 16kb정도라고 합니다.(조사 필요)

#define Z_BUFSIZE 16384 // c언어 buffer size, 출처 chatgpt

DefaultOutputStream 는 buffer size가 512로 코드에 나와있습니다.

GZIPOutputStream(OutputStream out) 생성자를 이용하는 것을 확인할 수 있습니다.

이 생성자의 정의를 보면,

"Creates a new output stream with a default buffer size."

로 되어 있습니다.

(출처 : https://docs.oracle.com/javase/8/docs/api/java/util/zip/GZIPOutputStream.html)

그렇다면 defautl buffer size는 몇 일까요?

코드를 따라가보면,

public GZIPOutputStream(OutputStream out) throws IOException {
    this(out, 512, false);
}


public GZIPOutputStream(OutputStream out, int size, boolean syncFlush)throws IOException {
    super(out, out != null ? new Deflater(Deflater.DEFAULT_COMPRESSION, true) : null,
          size,
          syncFlush);
    usesDefaultDeflater = true;
    writeHeader();
    crc.reset();
}




public DeflaterOutputStream(OutputStream out,
                                Deflater def,
                                int size,
                                boolean syncFlush) {
    super(out);
    if (out == null || def == null) {
        throw new NullPointerException();
    } else if (size <= 0) {
        throw new IllegalArgumentException("buffer size <= 0");
    }
    this.def = def;
    //여기!
    this.buf = new byte[size];
    this.syncFlush = syncFlush;
}

512byte를 default로 이용하고 있는 것을 볼 수 있습니다.

그렇기 때문에 파일이 아무리 크더라도 OOM은 거의 발생하지 않는다고 합니다.

또 궁금한 점은 만약 디스크에 이상이 생기거나, NAS 에 데이터를 쓰면 데이터를 쓸 때 지연이 발생할 수 있습니다.

그러면 blocking이 생기게 될 것이고, 이에 의한 성능 문제는 없을까요?

이 부분은 나중에 좀 더 알아보도록 하겠습니다.

저작자표시 비영리 동일조건 (새창열림)

'BackEnd > java' 카테고리의 다른 글

[Mybatis] 테스트 코드 작성 (0)	2024.11.23
[Java] Integer 캐시 관련 (0)	2024.03.07

하용권

[JAVA] logback gz 압축 방법

RollingPolicy

의문

Logback gz 압축 코드

BufferedInputStream

GZIPOutputStream

'BackEnd > java' 카테고리의 다른 글

티스토리툴바

[JAVA] logback gz 압축 방법

RollingPolicy

의문

Logback gz 압축 코드

BufferedInputStream

GZIPOutputStream

'BackEnd > java' 카테고리의 다른 글

관련글

티스토리툴바