Table of Contents
The Gobblin distribution contains a CLI at
bin/gobblin providing CLI access to various Gobblin applications and commands. To run:
The usage is
bin/gobblin <command> where the command specifies an application to run. Running
bin/gobblin -h provides a list of available commands.
The special command
bin/gobblin classpath is trapped by the bash script and simply displays the full classpath that Gobblin uses.
If running from an IDE, the main method for the CLI is
A note on Hadoop classpath
bin/gobblin, the script automatically finds the classpath for the job. Although Hadoop jars are included in the Gobblin distribution, if
HADOOP_HOME is set in the environment, Gobblin will instead use the classpath provided by the local Hadoop installation.
An important side effect of this is that (depending on the application) non-fully-qualified paths (like
/my/path) will default to local file system if
HADOOP_HOME is not set, while they will default to HDFS if the variable is set. When referring to local paths, it is always a good idea to use the fully qualified path (e.g.
Gobblin ingestion applications
Gobblin ingestion applications can be accessed through the command
bin/gobblin run [listQuickApps] [<quick-app>] -jobName <jobName> [OPTIONS]
For usage run
bin/gobblin run uses Embedded Gobblin and subclasses to run Gobblin ingestion jobs, giving CLI access to most functionality that could be achieved using
EmbeddedGobblin. For example, the following command will run a Hello World job (it will print "Hello World 1 !" somewhere in the logs).
bin/gobblin run -jobName helloWorld -setTemplate resource:///templates/hello-world.template
Obviously, it is daunting to have to know the path to templates and exactly which configurations to set. The alternative is to use a quick app. Running:
bin/gobblin run listQuickApps
will provide with a list of available quick apps. To run a quick app:
bin/gobblin run <quick-app-name>
Quick apps may require additional arguments. For the usage of a particular app, run
bin/gobblin run <quick-app-name> -h.
For example, consider the quick app distcp:
$ bin/gobblin run distcp -h usage: gobblin run distcp [OPTIONS] <source> <target> -delete Delete files in target that don't exist on source. -deleteEmptyParentDirectories If deleting files on target, also delete newly empty parent directories. -distributeJar <arg> -h,--help -l Uses log to print out erros in the base CLI code. -mrMode -setConfiguration <arg> -setJobTimeout <arg> -setLaunchTimeout <arg> -setShutdownTimeout <arg> -simulate -update Specifies files should be updated if they're different in the source. -useStateStore <arg>
This provides usage for the app distcp, as well as listing all available options. Distcp could then be run:
bin/gobblin run distcp file:///source/path file:///target/path
Developing quick apps for the CLI
It is very easy to convert a subclass of
EmbeddedGobblin into a quick application for Gobblin CLI. All that is needed is to implement a
EmbeddedGobblinCliFactory which knows how instantiate the
EmbeddedGobblin from a
CommandLine object and annotate it with the
Alias annotation. There are two utility classes that make this very easy:
PublicMethodsGobblinCliFactory: this class will automatically infer CLI options from the public methods of a subclass of
EmbeddedGobblin. All the developer has to do is implement the method
constructEmbeddedGobblin(CommandLine)that calls the appropriate constructor of the desired
EmbeddedGobblinsubclass with parameters extracted from the CLI. Additionally, it is a good idea to override
getUsageString()with the appropriate usage string. For an example, see
ConstructorAndPublicMethodsGobblinCliFactory: this class does everything
PublicMethodsGobblinCliFactorydoes, but it additionally automatically infers how to construct the
EmbeddedGobblinobject from a constructor annotated with
EmbeddedGobblinCliSupport. For an example, see
Implementing new Gobblin commands
To implement a new Gobblin command to list and execute using
bin/gobblin, implement the class
gobblin.runtime.cli.CliApplication, and annotate it with the
Alias annotation. The Gobblin CLI will automatically find the command, and users can invoke it by the Alias value.