# Tim Howes -- File syncing tools - syncthing, dat, git-annex

May 2, 2018 at 5-6:30pm in BIDS, 190 Doe Library

## File syncing tools

I will discuss open source tools that you can use to sync files directly between computers, rather than relying on paid cloud services such as dropbox. These can be especially useful when dealing with large scientific datasets, which may be impractical to sync to the cloud, and for which you may want more control over versioning information. If you want something similar to a cloud service, but with more control, you can set up these tools in your own virtual private server.

### syncthing

syncthing is a cross-platform tool that can be used to keep folders in sync between your own devices or to share with collaborators. The settings can be customized to ignore certain files or sub-directories on specific machines, and there are different options available for keeping copies of old versions of files.

### dat

Dat is a protocol for peer-to-peer sharing of collections of files. This has similar advantages to sharing files using bittorrent, but it also includes the ability to update the files in an archive and track the version history.

### git-annex

git-annex is a tool that allows you to track large files within your git repositories, and it gives you a high level of control over which clones of the repository actually get the full file contents and which get only small placeholder files. This means that you can view and organize the full directory tree on your local machine without having to actually download all the files, and you can download the contents of individual files when needed using “git-annex get”. A special git-annex branch tracks the locations of the file contents and ensures that the correct number of copies exist on other machines before “dropping” the local file.

## Usage notes

### syncthing

Syncthing keeps folders in sync between machines by making a secure, direct connection between the machines (or optionally by using relay servers if a direct connection is not possible). It is a simple tool that can be started at the command line, run in the background, and viewed/controlled via a web browser.

#### Installation

https://docs.syncthing.net/intro/getting-started.html https://docs.syncthing.net/users/autostart.html

Install and enable on Ubuntu:

sudo apt install syncthing

# Enable as automatic background service
sudo systemctl enable syncthing@myuser.service
sudo systemctl start syncthing@myuser.service

# or run syncthing manually on the command line


Check status on Ubuntu:

#Check service status
sudo systemctl status syncthing@myuser.service

#Check logs
sudo journalctl -e -u syncthing@myuser.service


Install and enable on macOS:
(First install homebrew: https://brew.sh/)

brew install syncthing

#Enable as automatic background service
cp /usr/local/Cellar/syncthing/latest/homebrew.mxcl.syncthing.plist ~/Library/LaunchAgents/syncthing.plist

# run syncthing manually on the command line


You may need to adjust firewall settings to allow incoming connections. On Mac, you will usually be prompted to allow this the first time you start syncthing.

https://docs.syncthing.net/users/firewall.html

#### Connect to a new machine

Vist http://localhost:8384 to view the GUI for your running syncthing.

Click “Add remote device” and enter the device’s long unique ID. If you’re on the same local network as the other device, it will show up as a suggestion so you don’t have to type it.

Give the device whatever nickname you like. Specify the IP address (if it is stable) or leave as ‘dynamic’ to find the device automatically based on the ID. Choose which folders to share with the device. Choose ‘introducer’ if you would like to receive other folders automatically from the device.

https://docs.syncthing.net/intro/getting-started.html#configuring

#### Ignore files

https://docs.syncthing.net/users/ignoring.html

#### Keep old versions

https://docs.syncthing.net/users/versioning.html

#### other tips

• Set up a virtual private server on a cloud provider if you want to have an always-on machine that can act as the central hub.

• If syncing files between Mac and Linux, you might need to watch out for case sensitivity (Linux filesystems are case-sensitive, Mac by default is not). You can create a new APFS volume on your Mac hard drive with case sensitivity enabled, and put your sync folders there to avoid issues.

• If running on a server where you don’t have root access, download and run syncthing manually or enable as a user service.

https://docs.syncthing.net/users/autostart.html#using-systemd

### dat

https://docs.datproject.org/tutorial

Resources for data sharing with dat: https://datbase.org/ https://blog.datproject.org/tag/science/

Beaker, a web browser based on dat that enables peer-to-peer, editable websites: https://beakerbrowser.com/ https://beakerbrowser.com/2017/06/14/forking-websites-on-the-p2p-web.html

### git-annex

http://git-annex.branchable.com/walkthrough

#### Example setup

Initialize a repository:

mkdir project
cd project
git init
git annex init --version=6 "My desktop"


cp ~/Downloads/ubuntu.iso .
git commit -a -m "Added a file"


Clone on another folder on the same computer (could be a removable drive):

cd /media/usb
git clone ~/project
cd project
git annex init --version=6 "Portable drive"


Sync between clones (takes care of commiting, pushing, and pulling):

cd /media/usb/annex
git annex sync

# To get the content of large files in this step, use --content
git annex sync --content


#### git-annex assistant

Automated sync tool with a GUI

https://git-annex.branchable.com/assistant/