Skip to content

File synchronization tools

File synchronization involves updating files in two or more locations to ensure they contain the same data. Common tools and utilities for this purpose include copy protocols, network synchronization utilities, and system-level daemons designed to keep data consistent across different hosts.

Remote Copy Utilities

SCP (Secure Copy) is a command-line utility used to securely copy files and directories between hosts. It operates over a network, using the syntax scp -r source user@ip:dest to transfer data from a source to a destination^[600-developer-linux-centos7-command.md].

Rsync is a synchronization utility that provides faster transfer speeds for large amounts of data by comparing the source and destination^[600-developer-linux-centos7-command.md]. Unlike standard copy commands, rsync only transfers files that are different between the two locations; it skips files that are already identical in content and only replaces those that have changed^[600-developer-linux-centos7-command.md]. Additionally, rsync acts as an incremental backup tool where it typically only adds or updates files and does not delete files from the destination if they are missing from the source^[600-developer-linux-centos7-command.md].

Security and Automation

To facilitate automated file transfers without user interaction, SSH passwordless login is frequently employed^[600-developer-linux-centos7-command.md]. This method involves generating a key pair (public and private keys) and copying the public key to the target host's authorized_keys file^[600-developer-linux-centos7-command.md].

Custom Synchronization Scripts

Administrators often create scripts to extend standard tools for distributing files across multiple nodes in a cluster:

  • xsync.sh: A shell script used to loop through a list of defined nodes and synchronize a specific file or directory to all of them^[600-developer-linux-centos7-command.md]. It typically uses rsync as the underlying engine to copy the source path to the same path on target hosts (e.g., hadoop101, hadoop102)^[600-developer-linux-centos7-command.md].
  • xcall.sh: A shell script designed to execute a specific command on all nodes in a cluster^[600-developer-linux-centos7-command.md]. While primarily for command execution, it is often grouped with synchronization utilities for cluster management^[600-developer-linux-centos7-command.md].
  • [[Command-line interface]]
  • [[SSH]]
  • [[Shell scripting]]

Sources

  • 600-developer-linux-centos7-command.md