Machine Problem 2 - LoserFS: Easy peer-to-peer filesystem

  • Early bird bonus deadline: 11:59 PM, 2018/11/21.
  • Final due date: 11:59 PM, 2018/12/05.
  • No late submissions.

TOC

Introduction

Welcomes LoserFS!

We are going to implement a simple peer-to-peer filesystem from scratch. It’s built on top of gangs of losers we made, so let’s call it LoserFS. To start off, we have to understand how it works.

We upgrade loser from MP1 to loser_peer server at first. It is basically maintains a chain of commits, or log, like loser in MP1. Many loser_peers cooperate together to form a single filesystem. They can look up logs from each other, but can only append new commits to its own log.

But where comes the filesystem? The magic lies in the way loser_peers organize others’ logs into a logical view.

Logical filesystem view

Suppose a loser_peer, Resol, has only one best friend, Reep. Let’s look at their logs.

  • Resol’s log
    # commit 1
    [new_file]
    love_letter.txt
    [modified]
    [copied]
    [deleted]
    (MD5)
    love_letter.txt 51dfcbaeb6beab0729f6aca504cc429b
    (timestamp)
    1541611000000
    
    # commit 2
    [new_file]
    QQ.txt
    [modified]
    [copied]
    [deleted]
    love_letter.txt
    (MD5)
    QQ.txt 7e56035a736d269ad670f312496a0846
    (timestamp)
    1541611500000
    
  • Reep’s log
    # commit 1
    [new_file]
    thanks_good_guy.txt
    [modified]
    [copied]
    [deleted]
    (MD5)
    thanks_good_guy.txt 9e204f247de876733be099d5fbfa9ada
    (timestamp)
    1541611200000
    

The log is similar to loser log in MP1, except they are arranged in descending order, and have extra (update) UNIX (timestamp) in milliseconds field to tell its occurrence. You can notice MD5 is calculated only when that file content is affected in that commit. To form a single filesystem, we replay the all commits ordered by timestamp. The resulting logical view is shown below. We eventually have two files. Note that love_letter.txt disappears because it’s deleted during replay. The same rule also applies if there are more than two peers.

  • The logical view
    QQ.txt
    thanks_good_guy.txt
    (MD5)
    QQ.txt 7e56035a736d269ad670f312496a0846
    thanks_good_guy.txt 9e204f247de876733be099d5fbfa9ada
    

Conflict resolution

What if Resol and Reep creates the same file or modify a file at the same time? We have a set of ordering and conflict resolution rules to determine which commit goes first. Checkout this document to understand the rules.

(update) Summary

  • Every peer are identical programs, but each of them has a private log.
  • One peer can append commits to its private log, but cannot append on others’. However, they can read the logs from others.
  • By combining logs from all peers, whoever user on each peer can see identical content of the filesystem.

Required Features

The big picture

The big picture of LoserFS

Our architecture supports arbitrary number of loser_peer processes. They connect to each other over UNIX domain sockets. This socket looks like a normal file, and yes, it has a filename, except that input/output to this file is handled by your process. See here for examples.

In this picture, peer1 owns the socket at /tmp/mp2-peer1.sock, peer2 owns /tmp/mp2-peer2.sock, and so on. Peer1 connects to mp2-peer2.sock and other three sockets. Conversely, peer1 accepts connections from other four peers. In this way, peers talks to each other by sending/receiving bytes. You have to design the protocol between the peers.

Every peer provides user command line interface through stdin/stdout. Your program should support list, cp, mv and other several commands. Checkout next section to see details.

File hierarchy and usage

Place a Makefile under our homework directory. The judge will run make to build a loser_peer executable in the same directory.

repo
├── MP0
├── MP1
└── MP2
    ├── Makefile
    └── other files

The usage is ./loser_peer [CONFIG_FILE]. Our program loads specified config file with this example content.

name = resol
peers = reep hsarc gnah
repo = /home/resol/repo

According the the config, it creates a socket at /tmp/mp2-resol.sock on startup, and stores files and commits in /home/resol/repo directory.

  • If the config file is not readable, socket cannot be created, or repo direcotry doesn’t exist or not writable, exit with code EXIT_FAILURE.
  • The program expects (update, sorry for typo) /tmp/mp2-reep.sock and others from peers. If any one of them is missing, the program ignores it and continue to accept user commands.

User commands

We use Resol and Reep example above in introduction section for better clarification.

  • list

    This commands shows the most recent logical view. It lists all filenames in dictionary order and their digest. In the Resol and Reep example above, the output should be:

    QQ.txt
    thanks_good_guy.txt
    (MD5)
    QQ.txt 7e56035a736d269ad670f312496a0846
    thanks_good_guy.txt 9e204f247de876733be099d5fbfa9ada
    
  • history [-a]

    Without -a option, it shows the commits history of local repository ordered by timestamp. In our example, the output should be the following for peer Resol:

    # commit 1
    [new_file]
    love_letter.txt
    [modified]
    [copied]
    [deleted]
    (MD5)
    love_letter.txt 51dfcbaeb6beab0729f6aca504cc429b
    (timestamp)
    1541611000000
    
    # commit 2
    [new_file]
    QQ.txt
    [modified]
    [copied]
    [deleted]
    love_letter.txt
    (MD5)
    QQ.txt 7e56035a736d269ad670f312496a0846
    (timestamp)
    1541611500000
    

    With -a option supplied, it prints the merged commit history from all peers. You make to make sure every peer have identical output and obey the ordering rules.

    # commit 1
    [new_file]
    love_letter.txt
    [modified]
    [copied]
    [deleted]
    (MD5)
    love_letter.txt 51dfcbaeb6beab0729f6aca504cc429b
    (timestamp)
    1541611000000
    
    # commit 2
    [new_file]
    thanks_good_guy.txt
    [modified]
    [copied]
    [deleted]
    (MD5)
    thanks_good_guy.txt 9e204f247de876733be099d5fbfa9ada
    (timestamp)
    1541611200000
    
    # commit 3
    [new_file]
    QQ.txt
    [modified]
    [copied]
    [deleted]
    love_letter.txt
    (MD5)
    QQ.txt 7e56035a736d269ad670f312496a0846
    (timestamp)
    1541611500000
    

    Note the format differs from MP1. MD5 digest is calculated only for file creation and modification, and extra (timestamp) in milliseconds is provided. (update) The last commit in output does not have extra line breaks.

  • cp SOURCE DEST

    UNIX-style file copy. The SOURCE and DEST paths can be either a local file, such as /etc/passwd, or a file inside LoserFS, prefixed with @. For example, cp /etc/fstab @target.txt copies /etc/fstab from local machine to target.txt in LoserFS. cp @myfile.txt /tmp/target.txt does similarly.

    If SOURCE does not exist, print fail in stdout, otherwise print success once it completes. DEST file may be replaced if it already exists.

  • mv SOURCE DEST

    UNIX-style file renaming. Like cp command, we can specify local machine file and LoserFS file in SOURCE and DEST. If SOURCE does not exist or cannot be un, do nothing but print fail in stdout, otherwise print success once it completes. DEST file may be replaced if it already exists.

  • rm FILE

    UNIX-style file deletion. This command can ONLY removes files in LoserFS. Hence, FILE is always prefixed with @ like rm @myfile.txt.

  • exit

    Cause the program print bye and exit normally. If there’s an ongoing file copy session, the program waits until all sessions completes.

Special notes

  • Only regular files are considered. Directory support is treated as bonus.

  • Suppose a file is copied from peer A to peer B. If source file is modified or deleted before copying completes, it still proceeds with original content.

  • Be aware of concurrency. Commands on peer A should not block user commands on peer B. Also, if multiple peers copy files from A, they should not block each other.

  • Every peer keeps track of others’ logs as soon as possible. Suppose peer A fetches peer B’s log and then B exits. The command history -a on A includes latest commits from B. Also, if B never shows up since A’s startup, we pretend B has empty log until B starts.

  • (update) Users cannot directly make a commit, but peer program decides when and how to make a commit. The only restriction is that every peer can only append new commits to its own log. Be aware of the grading criteria that other peers can see updated log in 0.5 second.

  • (update) The peer should store data in its repo directory specified in config. One grading item requires peers can restore the log and files after restarting. The homework doesn’t assume the format and structure of repo directory, and the judge will never peek the repo.

  • (update) Peers can exchange data ONLY through sockets. Direct copy from others’ repo is not allowed.

  • (update 2) The judge will never touch the repo directory, as well as how you manage your files in repo.

Bonus features

These features are optional and impose no penalty on yout grade.

  • Directory support This features enable us to specify paths like @first_dir/second_dir/my_file.txt. To implement this feature, you don’t need to calculate MD5 digest on directories. You have to provide details in bonus.md on the way you apply or not to apply version control on directories.

  • Alternative to timestamp LoserFS depends on timestamps to sort the commits. It’s critically flawed because peers may have different clock time and different ticking rate. Hence, LoserFS is ignorant of the true occurance order of commits. You can describe your solution in bonus.md and implement the proof-of-concept. The solution should work on mere assumption that peers have their own monotonic clocks.

Judge

Grading

  • (3pt) history [-a] and list with 5 peers in total. Finish both to get full points.
  • cp and mv
    • (1pt) (update 3) Transfer between LoserFS and local machine by cp and mv.
    • (1pt) Cross-peer transfer.
    • (1pt) After the command completes, other peers see latest log in 0.5 second.
    • The points is granted only if both cp and mv achieve that item.
  • (2pt) Concurrent data copy between peers.
  • (1pt) rm command.
  • (1pt) Peers can exit and restart without errors, and restore saved commits.

Bonus

  • You can obtain 3-point early bird bonus if
    • (update 2) Local machine to LoserFS transfer through mv and cp. (LoserFS to local machine is not needed.)
    • Finish history [-a] and list with 2 peers in total.
    • Create empty early.md in mp2 directory before early bird due date.
  • Finish one optional feature mentioned above to get at most 3 points.
    • Write bonus features and its details in bonus.md under mp2 directory.

Guarantees

  • Files are at most 2 GB in size.
  • Only regular files. No directories, symbolic links and hard links.
  • At most 5 peers in total. (update) Peer names, UNIX socekt file addresses and filenames are up to PATH_MAX in length. (update 2) Filenames consist of printable ASCII except space characters (space, TAB, etc).
  • Each line in config file and command output is ended with \n.

Example test cases

TA acknowledges the openess of this spec. Please take a look at examples test cases. Hope it make things more clear.

Resources