Manners to be aware of when writing files in all languages

429

What I want to say

When writing important files, it is necessary to write in a way that takes into account unexpected OS shutdowns. If you don't know how to do it, half-baked or empty files will be generated, which will be fatal at system startup or in the linked system.

C/Java/Python/JavaScript (node.js) as an example, but it needs to be addressed in almost all languages.

background

A fatal glitch occurred in which the software in production was not launched.

When I collected and analyzed the log and config file, the configuration file was corrupted.

The config file is read at startup, but may be written as needed. As you follow the code, you realize that if you force quit during the write process, it may be written halfway.

When it is finished, the power supply is cut off, and the timing may have miraculously overlapped.

What I tried first

If you write directly in the config file, even if it is only for a short time, you can be in a halfway state (for example, when you want to write 10 characters, you can still only spend 1 character). Once you have completed writing to config.yml.tmp and then renaming it to config.yml, you can atomically make it all updated or not!

Result 1

I will release it with full satisfaction, but there is a problem that it does not start again. This time, the config file is 0 KB. If tmp file writing fails, it will not be renamed, so it should be absolutely impossible on the flow to become 0 KB. Flush is explicit, and Close is neat.

While flipping through the O'Reilly book Linux system programming at hand, I found an item.

  • fsync and fdatasync

What? Writes a dirty buffer to disk?

It is here that we finally realize the error and the solution. Well, it didn't come out as knowledge that I had learned.

Dirty Buffers

Files are not written immediately after the write operation is programmed, but are saved in the form of a dirty buffer.

The disc export process is extremely slow, so if you write it out one by one, the program will be insanely slow (more than 100 times or so). In order to avoid this, it is a technique that is always adopted in recent file systems. In this approach, the process delegates a slow write operation to the operating system and allows itself to proceed. When the OS writes to the disk depends on the file system with which the OS interacts. In some cases, it can take more than 10 seconds, which is more than enough lead time to break a file.

In this state, if the OS drops due to unexpected power interruption etc., even if flush and close are performed on the program flow, a broken file or an empty file will be completed. In other words, config.yml .tmp itself was half-baked, so even if it was renamed config.yml, it was half-hearted.

When fsync() is called, the dirty buffer is immediately written to disk, and it completes before proceeding to the next operation.

fsync() and fdatasync()

In Linux, a file consists of two types of data:

  • Metadata called inodes
  • The data of the file itself

inode refers to the date and time of file modification and data displayed when managing in a directory.

fsync() writes both, and fdatasync() writes only the data of the file itself.

If the file is not updated often, or if you do not need to worry about performance, fsync() is OK.

Use fdatasync() when inodes (i.e. metadata such as last modified time) do not need to be updated at worst, or when they are frequently updated and performance is a concern.

countermeasure

When generating important files, it is safe to write to disk forcibly even if there is an unexpected shutdown. In Linux, a proper shutdown is not a problem.

Here are some specific examples of code: Some error handling is omitted.

sample.c
int main() {
    const char* tmp_file_path = "./fsync_test.txt";
    FILE* fp= fopen(tmp_file_path , "w");
    int fd = fileno(fp);
    fputs("fsync() test\
", fp);
    fflush(fp);

    // ここがポイント!!
    int fsync_ret = fsync(fd);

    fclose(fp);
    return fsync_ret;
}

FsyncTest.java
import java.io.File;
import java.io.FileOutputStream;

public class FsyncTest {
    public static void main(String[] args) throws Exception {
        File file = new File("./fsync.txt");
        try (FileOutputStream output = new FileOutputStream(file);) {
            output.write("Fsync() test\
".getBytes("UTF-8"));

            // ここがポイント!!
            output.getFD().sync();
        }
    }
}
sample.py
import os

with open('./fsync_test.txt', 'w') as f:
    f.write('fsync() test')
    f.flush() # これだけだとダーティバッファ

    # ここがポイント!!
    os.fsync(f.fileno())

fsync.js
const http = require('http');

const server = http.createServer((request, response) => {
    const fs = require('fs');
    fs.open('./fsync_test.txt', 'w', (err, fd) => {
        fs.write(fd, 'fsync() test\
', () => {

            // ここがポイント!!
            fs.fsyncSync(fd);

            response.writeHead(200, {'Content-Type': 'text/plain'})
            response.end('Write success\
');
            fs.close(fd, ()=>{});
        })
    })
})

server.listen(9000);

Result 2

The file is no longer corrupted.

By the way, the dirty buffer is quite long. Depending on the file system, it may be close to 30 seconds. Before the countermeasure, in Windows 7 at hand, I wrote a file, opened it in a text editor to check that the contents were written, and after 15 seconds, I unplugged the power, and the file was broken after startup. Trying it in a CentOS6 environment had almost the same results.

After the countermeasures, it became unbroken even immediately after writing.

conclusion

Not limited to languages, fsync() or equivalent processing is essential for important file generation. However, this forces for extremely slow synchronous disk writes, which can severely affect performance under some conditions. It's not okay to do it in the dark because it's safe.

supplement

Related Challenges

In the comments, please include a link that is highly relevant to this article.

Firefox Challenges

This is a problem caused by the slow fsync() is right? It is an article that by using fdatasync(), the inode becomes faster as it is not updated.

PostgreSQL Challenges

https://masahikosawada.github.io/2019/02/17/PostgreSQL-fsync-issue/ If fsync() fails, it is useless to call fsync() again. In PostgreSQL, if fsync() fails, it crashes the database and sends it from the transaction log (WAL) to It seems that they made a fix to make it restored.

I thought that it would fail in the first place, but it seems that it happens easily with SAN and NFS. If you write on a general system, you can keep the data to write() and start over again from write().

Support on Windows

fsync() is a Linux library function and is not available on Windows. On Windows, this can be achieved by using the following API.

BOOL FlushFileBuffers(HANDLE hFile);

In addition, in Windows 7, it can be realized with the setting of the entire OS. [1] Open Control Panel [2] Open Device Manager [3] Select the disk from the disk drive, open the properties [4] On the Policies tab, uncheck "Enable write caching for devices"

Be aware that taking this technique will slow down not only your app but all operations.

Share:
429
Author by

3歳児子育て中のIoTエンジニアです。

Updated on July 01, 2020