Open Source Software Technical Articles

Want the Best of the Wazi Blogs Delivered Directly to your Inbox?

Subscribe to Wazi by Email

Your email:

Connect with Us!

Current Articles | RSS Feed RSS Feed

Unison Makes Two-Way File Sync Simple

  
  
  
Unison is a powerful file and directory synchronization tool that, unlike popular tools like rsync, can perform two-way synchronization. With Unison, you can sync files between your work computer and home desktop, and any changes done at one machine will be reflected at the other.

Unison runs on a variety of operating systems, including Linux, Windows, Solaris, and Mac OS X, and you can sync files across machines running different platforms. However, Unison is version-sensitive – if versions are not the same across the machines, there is a high probability that files will not be synchronized properly.

Installing Unison is easy. If you are running a major Red Hat- or Debian-based Linux distribution like Fedora or Ubuntu, you'll find it in the standard repository. Enterprise Linux users, including those running CentOS and Scientific Linux, can install Unison from the Extra Packages for Enterprise Linux (EPEL) repository. Regardless of repository, if the # yum install unison command fails, you can download the source and build the application yourself.


To begin synchronizing files between two machines, you need to ensure that machines can connect to each other via SSH. If you want to maintain a directory of identical files across machines, I recommend you first copy the relevant files from their current directory to the destination using scp or another tool to ensure that both machines start out with identical content. Synchronization using SSH requires the remote machine run sshd. To start synchronizing files, on the local machine, run a command like
$ unison ssh://user@remote_server//data_backup/ data_dir/

This command synchronizes the data_backup directory on the remote server with data_dir on the local machine. Unison will present you with GUI dialog boxes to enter a password (if you are not using key-based SSH authentication) and with details about synchronization progress. If you are annoyed by the GUI dialog boxes you can turn them off by setting the -ui flag to text:
$ unison -ui text ssh://user@remote_server//data_backup/ data_dir/

When you fire off a command like the one above, Unison asks you several questions related to synchronization, including properties like time and permission synchronization. This means that you can preserve the file creation and modification time stamps along with user permissions while synchronizing. If you are planning to run the same command every day, or if you have a lot of files to synchronize, you can avoid providing answers by creating a preference file that Unison will read for its answers. At a bare minimum, a preference file specifies two root directories to be synchronized. Note that there is no source and destination for Unison – it is a two-way sync:
root = data_dir/
root = ssh://user@remote_server//data_backup/

Save the file as ~/unison/test.prf, and you can then invoke unison test; you won't need to specify the whole command.

What if you want to synchronize data_dir/a/ and data_dir/b/ but not data_dir/c/ or any other directory in data_dir/? Add these lines to test.prf:
path = a/
path = b/

We still haven't stopped Unison from bugging us with dialogues and questions. To do that, add batch and auto to enable Unison to act on its own. You might also want to ignore any file with the extension .bak and files beginning with "~":
batch = true
auto = true

ignore = Name *.bak
ignore = Name ~*

If you are synchronizing a Windows and a Linux box, include perm = 0 in the test.prf file to accommodate file permissions that behave differently in the two operating systems.

19a98812-f823-48dc-841e-bf029c63c6d7

If you are synchronizing large replicas or roots, you might find it convenient to synchronize only the parts that you know have changes. You can break your preference file into smaller chunks. Create a file named parent in the ~/.unison/ directory and specify the root directories you want to sync there. Use another file, say patha.prf, for synchronizing only subdirectory a/, and pathb.prf for synchronizing b/. The prf files will look like this:
$ cat ~/.unison/parent
root = data_dir/
root = ssh://user@remote_server//data_backup/

batch = true

ignore = Name *.bak
include parent

$ cat ~/.unison/patha.prf
path = a/
include parent

$ cat ~/.unison/pathb.prf
path = b/
include parent

$ cat ~/.unison/test.prf
include parent

Now if you just want to synchronize the directory data_dir/a/, you need not to do a full synchronization. Running unison patha will suffice.

Sockets vs. SSH


You can create a Unison server for anonymous users using sockets instead of SSH. A socket is an endpoint of a bidirectional interprocess communication flow across an IP-based network. Using sockets is a faster way of synchronization, but the tradeoff is the security of your data, because, unlike SSH, sockets connections are not encrypted. If you want to use sockets, you can run Unison on the remote server on any port that is not in use. Pick any port greater than 1024 but ensure that it is not in use; if it is, you will get an error. You can run the following command to create the server anytime, or put it in a script to invoke it during startup if you want a persistent Unison server.
unison -socket 2244

Modify your client's test.prf file. Change the root that's using the ssh path to root = socket://unison-server:2244//data_backup/.

When performing updates, both Unison and rsync use an algorithm that transfers only the parts of the files that have actually changed. Because Unison provides two-way sync, unlike rsync, you may want to turn off this algorithm and force Unison to transfer complete files. There is no reason to do this under normal circumstances, but you might find this technique useful while debugging hashing issues or when you are running low on memory and want to avoid hashing overhead. To turn off the algorithm that transfers only changes and thus transfer whole files, set the -rsync flag to false.
unison -rsync=false ssh://user@remote_server//data_backup/ data_dir/


Unison works by executing fstat system call on each file descriptor, which returns a bunch of information about each file, including its inode, size, and owner. This means that, more than the amount of data to be transferred, the total number of inodes is the major performance bottleneck. Therefore you can synchronize a single 1GB file faster than 100 10MB files. Unison checks the inode number and modtime of a file to see whether it has changed and thus needs to be transferred. You can fool Unison by using the touch command or equivalent in Linux. Running touch file-name will change the timestamp on the file, so you can use this command to make Unison sync a particular file, even if you didn't otherwise update it.

Unison by default identifies machines by their hostnames, which are set by the operating system or on the basis of their IP addresses, which are often allocated by DHCP. Since a client's IP address might change, it is a good idea to define a variable UNISONLOCALHOSTNAME and put it in each machine's .bashrc file. Unison will then identify the machine using this variable irrespective of the hostname.

Unison also creates a hash of the data to be synchronized and then compares it across the machines to get the changes. This is one of the major reasons that you should initially copy the data before starting Unison for synchronization. If you do that, only a minimal number of changes will propagate. Ext3 and similar filesystems have been known to provide better performance than FAT while hashing, though you can use fastcheck option on Windows to boost the hashing speed. You also need to ensure that if you are running Unison across platforms, you should invoke Unison and control it from Linux only, because Unison is known for not handling Unicode characters properly on non-Linux platforms.

While Unison has many advantages over rsync, you can make it behave like rsync for one-way sync by using the -force flag. You might want to do this when you have multiple machines to synchronize. You can arrange the clients in a virtual star topology and sync every node from one master. However, if you are sure you will need only one-way replication, you might as well just use rsync; Unison is overkill in this case and will utilize more CPU cycles than rsync.

All of the above should be enough to get you started with Unison. You can customize the software further and do a lot more with this utility. To learn more about Unison, check out the Unison Manual.




This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.




This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.

Comments

Do you have a source or any information to point to regarding this statement? 
 
"You also need to ensure that if you are running Unison across platforms, you should invoke Unison and control it from Linux only, because Unison is known for not handling Unicode characters properly on non-Linux platforms." 
 
The official documentation on cross-platform syncing warns of filename case sensitivity and illegal characters in filenames for windows, but makes no mention of problems with Unicode: 
 
http://www.cis.upenn.edu/~bcpierce/unison/download/releases/stable/unison-manual.html#crossplatform
Posted @ Thursday, January 10, 2013 2:24 PM by Kevin C.
"Do you have a source or any information to point to regarding this statement?" 
 
I have experienced this issue with Unison, so though I have no reference, I can confirm that it is true. 
 
The other issue is that often it does not notice new files (on the other hand if I copy them manually, then it detects that the dates do not match).
Posted @ Monday, February 18, 2013 1:44 PM by *Tom
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics