An academic exercise that implements a FUSE file-system using Couchbase as the data store.
This is currently just a fun/academic/experimental project that I started to help me learn more about FUSE and to gain some first hand experience with libcouchbase (the C SDK for Couchbase) after seeing all of the new and exciting features coming in Couchbase Server 7.0.
The idea is to use Couchbase Server as a distributed data store for a user based file system using FUSE. Because this was just a fun exercise, I wasn't super focused on efficient file system design, I have not put much thought into optimizing key access patterns, nor have I worried about distributed locks, CAS, etc. It would also be better to optimize the network access using transactions or batch operations. Those topics should be addressed for this to be a more robust and useful distributed filesystem.
If you're looking for an actual distributed file server built on top of Couchbase, check out cbfs on CouchbaseLabs instead.
- I am currently using the FUSE high-level operations to create a logical overlay of a filesystem.
- I have not fully tested FUSE in the normal multi-threaded daemon mode of operation (only tested with
-f -s
so far). - All of the calls to Couchbase are currently synchronous and I haven't optimized batch calls or looked into transactions.
- Currently only developed and tested with macOS using
macFUSE
for convenience.
- Paths are currently limited to 250 characters (the length of a Couchbase key) but I have plans to expand that.
- Here are some thoughts:
- Must try to take advantage of Couchbase keys for quick lookup (and future improvements I want to explore).
- Must only use more expensive operations/techniques when needed (e.g., when path is larger than 250 characters).
- Must support at least 4096 character upper limit (the current path limit for ext4 file systems).
- I want to avoid more time consuming lookup strategies that require multiple trips (e.g., path keys, collision documents).
- However, using a counter may be useful if the solution is fast and results in fewer calls and less complex keys.
- Current idea:
- When path <= 250 characters:
- Just use it because it's already unique.
- When path > 250 characters:
- XXH128 hash is performed over entire path and converted to a 22 character Base64 string.
- The next 228 characters are samples to help add to the unique key property.
- From a
path
of sizen
(wheren > 250
) and starting at [0] the samples are taken from:- 30 chars starting at:
[1]
- 48 chars starting at:
[n\*0.25]
- 50 chars starting at:
[n\*0.50]
- 50 chars starting at:
[n\*0.75]
- 50 chars starting at:
[n-51]
- 30 chars starting at:
- Note that this strategy scales to try to find unique strings throughout. This is important because some storage patterns may have common sub-structures that are similar with unique paths earlier in the string (or visa-versa).
- XXH128 itself has practically zero chance of collision (see: https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison).
- The combination of XXH128 plus these character samples, with paths up to 4096, bounds the limits fairly well.
- Collision should be impossible, but I'll leave the math to prove it as an exercise for the theoretical Computer Scientists.
- When path <= 250 characters:
- REMINDER: This is all because I want to use the high-level operations which are heavily based on the
path
string. To avoid multiple calls, I need to obtain a unique key on the client without reverting to strategies that would require multiple trips. When using the high-level operations, this is true for all operations, but it's especially true for operations likegetattr
which are called very frequently.
- Here are some thoughts:
- Could be cleaner - this was just a quick mash up for experimentation and fun. I may improve it in the future if I have time or other ideas to explore.
- Tested primarily on macOS Big Sur (11.4) (x86).
- I tried to keep the CMake config "clean" so it should only require a little TLC to build for other Unix based operating systems.
- FUSE on Windows is a different story and Dokan is probably a better approach (rewrite or using the FUSE wrapper utility).
- Intall cmake utility
brew install cmake
- Install Couchbase Server 7.0
- Install Couchbase C library
brew install libcouchbase
- IMPORTANT: You'll need v3.1.0 until the next 7.0 RC is released!! (more info)
- I installed v3.1.0 by downloading source and doing a manual build.
- Install FUSE
brew install macfuse
- Tested with macfuse (v4.1.2) == FUSE (v2.9)
- Install cJSON
brew install cjson
- Tested with v1.7.14
- Install xxHash
brew install xxhash
- Tested with 0.8.0
- Install Visual Studio Code (optional)
- A decent general purpose IDE with lots of useful extensions.
- https://code.visualstudio.com/
- Build cbfuse
- see environment setup instructions above
mkdir build; cd build
cmake ..
cmake --build . --config Release
- Setup Couchbase
- Start the Couchbase server
- Create a bucket (e.g.,
cbfuse
) - See
./scripts/setup.sh
which does the following incbfuse
:- Under the
_default
Scope, add Collections:stats
- used for basic file stat attributesblocks
- used to store file data blocksdentries
- used to store directory entry info
- Under the
- Running a quick debug test
- This filesystem runs in the foreground and is single-threaded.
- Mount the filesystem
./cbfuse/cbfuse ~/cbfuse --cb_connect=couchbase://127.0.0.1/cbfuse --cb_username=raycardillo --cb_password=raycardillo
- Unmount the filesystem
umount cbfuse