Machine Problem 2 - LoserFS: Easy peer-to-peer filesystem
- Early bird bonus deadline: 11:59 PM, 2018/11/21.
- Final due date: 11:59 PM, 2018/12/05.
- No late submissions.
TOC
Introduction
Welcomes LoserFS!
We are going to implement a simple peer-to-peer filesystem from scratch. It’s built on top of gangs of losers we made, so let’s call it LoserFS. To start off, we have to understand how it works.
We upgrade loser from MP1 to loser_peer server at first. It is basically maintains a chain of commits, or log, like loser in MP1. Many loser_peers cooperate together to form a single filesystem. They can look up logs from each other, but can only append new commits to its own log.
But where comes the filesystem? The magic lies in the way loser_peers organize others’ logs into a logical view.
Logical filesystem view
Suppose a loser_peer, Resol, has only one best friend, Reep. Let’s look at their logs.
- Resol’s log
# commit 1 [new_file] love_letter.txt [modified] [copied] [deleted] (MD5) love_letter.txt 51dfcbaeb6beab0729f6aca504cc429b (timestamp) 1541611000000 # commit 2 [new_file] QQ.txt [modified] [copied] [deleted] love_letter.txt (MD5) QQ.txt 7e56035a736d269ad670f312496a0846 (timestamp) 1541611500000
- Reep’s log
# commit 1 [new_file] thanks_good_guy.txt [modified] [copied] [deleted] (MD5) thanks_good_guy.txt 9e204f247de876733be099d5fbfa9ada (timestamp) 1541611200000
The log is similar to loser log
in MP1, except they are arranged in descending order, and have extra (update) UNIX (timestamp) in milliseconds field to tell its occurrence. You can notice MD5 is calculated only when that file content is affected in that commit. To form a single filesystem, we replay the all commits ordered by timestamp. The resulting logical view is shown below. We eventually have two files. Note that love_letter.txt
disappears because it’s deleted during replay. The same rule also applies if there are more than two peers.
- The logical view
QQ.txt thanks_good_guy.txt (MD5) QQ.txt 7e56035a736d269ad670f312496a0846 thanks_good_guy.txt 9e204f247de876733be099d5fbfa9ada
Conflict resolution
What if Resol and Reep creates the same file or modify a file at the same time? We have a set of ordering and conflict resolution rules to determine which commit goes first. Checkout this document to understand the rules.
(update) Summary
- Every peer are identical programs, but each of them has a private log.
- One peer can append commits to its private log, but cannot append on others’. However, they can read the logs from others.
- By combining logs from all peers, whoever user on each peer can see identical content of the filesystem.
Required Features
The big picture
Our architecture supports arbitrary number of loser_peer processes. They connect to each other over UNIX domain sockets. This socket looks like a normal file, and yes, it has a filename, except that input/output to this file is handled by your process. See here for examples.
In this picture, peer1 owns the socket at /tmp/mp2-peer1.sock
, peer2 owns /tmp/mp2-peer2.sock
, and so on. Peer1 connects to mp2-peer2.sock
and other three sockets. Conversely, peer1 accepts connections from other four peers. In this way, peers talks to each other by sending/receiving bytes. You have to design the protocol between the peers.
Every peer provides user command line interface through stdin/stdout. Your program should support list, cp, mv and other several commands. Checkout next section to see details.
File hierarchy and usage
Place a Makefile
under our homework directory. The judge will run make
to build a loser_peer
executable in the same directory.
repo
├── MP0
├── MP1
└── MP2
├── Makefile
└── other files
The usage is ./loser_peer [CONFIG_FILE]
. Our program loads specified config file with this example content.
name = resol
peers = reep hsarc gnah
repo = /home/resol/repo
According the the config, it creates a socket at /tmp/mp2-resol.sock
on startup, and stores files and commits in /home/resol/repo
directory.
- If the config file is not readable, socket cannot be created, or repo direcotry doesn’t exist or not writable, exit with code
EXIT_FAILURE
. - The program expects (update, sorry for typo)
/tmp/mp2-reep.sock
and others from peers. If any one of them is missing, the program ignores it and continue to accept user commands.
User commands
We use Resol and Reep example above in introduction section for better clarification.
-
list
This commands shows the most recent logical view. It lists all filenames in dictionary order and their digest. In the Resol and Reep example above, the output should be:
QQ.txt thanks_good_guy.txt (MD5) QQ.txt 7e56035a736d269ad670f312496a0846 thanks_good_guy.txt 9e204f247de876733be099d5fbfa9ada
-
history [-a]
Without
-a
option, it shows the commits history of local repository ordered by timestamp. In our example, the output should be the following for peer Resol:# commit 1 [new_file] love_letter.txt [modified] [copied] [deleted] (MD5) love_letter.txt 51dfcbaeb6beab0729f6aca504cc429b (timestamp) 1541611000000 # commit 2 [new_file] QQ.txt [modified] [copied] [deleted] love_letter.txt (MD5) QQ.txt 7e56035a736d269ad670f312496a0846 (timestamp) 1541611500000
With
-a
option supplied, it prints the merged commit history from all peers. You make to make sure every peer have identical output and obey the ordering rules.# commit 1 [new_file] love_letter.txt [modified] [copied] [deleted] (MD5) love_letter.txt 51dfcbaeb6beab0729f6aca504cc429b (timestamp) 1541611000000 # commit 2 [new_file] thanks_good_guy.txt [modified] [copied] [deleted] (MD5) thanks_good_guy.txt 9e204f247de876733be099d5fbfa9ada (timestamp) 1541611200000 # commit 3 [new_file] QQ.txt [modified] [copied] [deleted] love_letter.txt (MD5) QQ.txt 7e56035a736d269ad670f312496a0846 (timestamp) 1541611500000
Note the format differs from MP1. MD5 digest is calculated only for file creation and modification, and extra (timestamp) in milliseconds is provided. (update) The last commit in output does not have extra line breaks.
-
cp SOURCE DEST
UNIX-style file copy. The SOURCE and DEST paths can be either a local file, such as
/etc/passwd
, or a file inside LoserFS, prefixed with @. For example,cp /etc/fstab @target.txt
copies/etc/fstab
from local machine totarget.txt
in LoserFS.cp @myfile.txt /tmp/target.txt
does similarly.If SOURCE does not exist, print
fail
in stdout, otherwise printsuccess
once it completes. DEST file may be replaced if it already exists. -
mv SOURCE DEST
UNIX-style file renaming. Like
cp
command, we can specify local machine file and LoserFS file in SOURCE and DEST. If SOURCE does not exist or cannot be un, do nothing but printfail
in stdout, otherwise printsuccess
once it completes. DEST file may be replaced if it already exists. -
rm FILE
UNIX-style file deletion. This command can ONLY removes files in LoserFS. Hence, FILE is always prefixed with @ like
rm @myfile.txt
. -
exit
Cause the program print
bye
and exit normally. If there’s an ongoing file copy session, the program waits until all sessions completes.
Special notes
-
Only regular files are considered. Directory support is treated as bonus.
-
Suppose a file is copied from peer A to peer B. If source file is modified or deleted before copying completes, it still proceeds with original content.
-
Be aware of concurrency. Commands on peer A should not block user commands on peer B. Also, if multiple peers copy files from A, they should not block each other.
-
Every peer keeps track of others’ logs as soon as possible. Suppose peer A fetches peer B’s log and then B exits. The command
history -a
on A includes latest commits from B. Also, if B never shows up since A’s startup, we pretend B has empty log until B starts. -
(update) Users cannot directly make a commit, but peer program decides when and how to make a commit. The only restriction is that every peer can only append new commits to its own log. Be aware of the grading criteria that other peers can see updated log in 0.5 second.
-
(update) The peer should store data in its repo directory specified in config. One grading item requires peers can restore the log and files after restarting. The homework doesn’t assume the format and structure of repo directory, and the judge will never peek the repo.
-
(update) Peers can exchange data ONLY through sockets. Direct copy from others’ repo is not allowed.
-
(update 2) The judge will never touch the repo directory, as well as how you manage your files in repo.
Bonus features
These features are optional and impose no penalty on yout grade.
-
Directory support This features enable us to specify paths like
@first_dir/second_dir/my_file.txt
. To implement this feature, you don’t need to calculate MD5 digest on directories. You have to provide details inbonus.md
on the way you apply or not to apply version control on directories. -
Alternative to timestamp LoserFS depends on timestamps to sort the commits. It’s critically flawed because peers may have different clock time and different ticking rate. Hence, LoserFS is ignorant of the true occurance order of commits. You can describe your solution in
bonus.md
and implement the proof-of-concept. The solution should work on mere assumption that peers have their own monotonic clocks.
Judge
Grading
- (3pt)
history [-a]
andlist
with 5 peers in total. Finish both to get full points. cp
andmv
- (1pt) (update 3) Transfer between LoserFS and local machine by
cp
andmv
. - (1pt) Cross-peer transfer.
- (1pt) After the command completes, other peers see latest log in 0.5 second.
- The points is granted only if both
cp
andmv
achieve that item.
- (1pt) (update 3) Transfer between LoserFS and local machine by
- (2pt) Concurrent data copy between peers.
- (1pt)
rm
command. - (1pt) Peers can
exit
and restart without errors, and restore saved commits.
Bonus
- You can obtain 3-point early bird bonus if
- (update 2) Local machine to LoserFS transfer through
mv
andcp
. (LoserFS to local machine is not needed.) - Finish
history [-a]
andlist
with 2 peers in total. - Create empty
early.md
inmp2
directory before early bird due date.
- (update 2) Local machine to LoserFS transfer through
- Finish one optional feature mentioned above to get at most 3 points.
- Write bonus features and its details in
bonus.md
undermp2
directory.
- Write bonus features and its details in
Guarantees
- Files are at most 2 GB in size.
- Only regular files. No directories, symbolic links and hard links.
- At most 5 peers in total. (update) Peer names, UNIX socekt file addresses and filenames are up to PATH_MAX in length. (update 2) Filenames consist of printable ASCII except space characters (space, TAB, etc).
- Each line in config file and command output is ended with
\n
.
Example test cases
TA acknowledges the openess of this spec. Please take a look at examples test cases. Hope it make things more clear.
Resources
- Download the example code to boostrap your work.
- You can either start from your MP1 or this example MP1 code.
- UNIX domain socket tutorial
- Remeber to take a look at conflict resolution rules