0.34.4 - 2014-06-23

  1. Switched default gcs-connector version to 1.2.7 for patch fixing a bug
     where globs wrongly reported "not found" in some cases in Hadoop 2.2.0.


0.34.3 - 2014-06-13

  1. Jobtracker / Resource manager recovery has been enabled by default to
     preserve job queues if the daemon dies.
  2. Fixed single_node_env.sh to work with hadoop2_env.sh
  3. Two new commands were added to bdutil: socksproxy and shell; socksproxy
     will establish a SOCKS proxy to the cluster and shell will start an SSH
     session to the namenode.
  4. A new variable, GCE_NETWORK, was added to bdutil_env.sh and can be set
     from the command line via the --network flag when deploying a cluster or
     generating a configuration file. The network specified by GCE_NETWORK
     must exist and must allow SSH connections from the host running bdutil
     and must allow intra-cluster communication.
  5. Increased configured heap sizes of the master daemons (JobTracker,
     NameNode, SecondaryNameNode, and ResourceManager).
  6. The HADOOP_LOG_DIR is now /hadoop/logs instead of the default
     /home/hadoop/hadoop-install/logs; if using attached PDs for larger disk
     storage, this directory resides on that attached PD rather than the
     boot volume, so that Hadoop logs will no longer fill up the boot disk.
  7. Added new extensions under bdutil-<version>/extensions/spark; includes
     spark_shark_env.sh and spark1_env.sh, both compatible for mixing with
     Hadoop2 as well. For now, doesn't use Mesos or YARN in either case,
     but suitable for single-user or Spark-only setups. The spark_shark_env.sh
     extension installs Spark + Shark 0.9.1, while spark1_env.sh only installs
     Spark 1.0.0, in which case Spark SQL serves as the alternative to Shark.


0.34.2 - 2014-06-05

  1. When using Hadoop 2 / YARN, and the default filesystem is set to 'gs', YARN
     log aggregation will be enabled and YARN application logs, including
     map-reduce task logs will be persisted to gs://<CONFIGBUCKET>/yarn-logs/.


0.34.1 - 2014-05-12

  1. Fixed a bug in the USE_ATTACHED_PDS feature (also enabled with
     -d/--use_attached_pds) where disks didn't get attached properly.


0.34.0 - 2014-05-08

  1. Changed sample applications and tools to use GenericOptionsParser instead
     of creating a new Configuration object directly.
  2. Added printout of bdutil version number alongside "usage" message.
  3. Added sleeps between async invocations of GCE API calls during deployment,
     configurable with: GCUTIL_SLEEP_TIME_BETWEEN_ASYNC_CALLS_SECONDS
  4. Added tee'ing of client-side console output into debuginfo.txt with better
     delineation of where the error is likely to have occurred.
  5. Just for extensions/querytools/querytools_env.sh, added an explicit
     mapred.working.dir to fix a bug where PigInputFormat crashes whenever the
     default FileSystem is different from the input FileSystem. This fix allows
     using GCS input paths in Pig with DEFAULT_FS='hdfs'.
  6. Added a retry-loop around "apt-get -y -qq update" since it may flake under
     high load.
  7. Significantly refactored bdutil into better-isolated helper functions, and
     added basic support for command-line flags and several new commnds. The old
     command "./bdutil env1.sh env2.sh" is now "./bdutil -e env1.sh,env2.sh".
     Type ./bdutil --help for an overview of all the new functionality.
  8. Added better checking of env and upload files before starting deployment.
  9. Reorganized bdutil_env.sh into logical sections with better descriptions.
  10. Significantly reduced amount of console output; printed dots indicate
     progress of async subprocesses. Controllable with VERBOSE_MODE or '-v'.
  11. Script and file dependencies are now staged through GCS rather than using
     gcutil push; drastically decreases bandwidth and improves scalability.
  12. Added MAX_CONCURRENT_ASYNC_PROCESSES to splitting the async loops into
     multiple smaller batches, to avoid OOMing.
  13. Made delete_cluster continue on error, still reporting a warning at the
     end if errors were encountered. This way, previously-failed cluster
     creations or deletions with partial resources still present can be
     cleaned up by retrying the "delete" command.


0.33.1 - 2014-04-09

  1. Added deployment scripts for the BigQuery and Datastore connectors.
  2. Added sample jarfiles for the BigQuery and Datastore connectors under
     a new /samples/ subdirectory along with scripts for running the samples.
  3. Set the default image type to backports-debian-7 for improved networking.


0.33.0 - 2014-03-21

  1. Renamed 'ghadoop' to 'bdutil' and ghadoop_env.sh to bdutil_env.sh.
  2. Bundled a collection of *-site.xml.template files in conf/ subdirectory
     which are integrated into the hadoop conf/ files in the remote scripts.
  3. Switched core-site template to new 'fs.gs.auth.*' syntax for
     enabling service-account auth.


0.32.0 - 2014-02-12

  1. ghadoop now always includes ghadoop_env.sh; only the overrides file needs
     to be specified, e.g. ghadoop deploy single_node_env.sh.
  2. Files in COMMAND_GROUPS are now relative to the directory in which ghadoop
     resides, rather than having to be inside libexec/. Absolute paths are
     also supported now.
  3. Added UPLOAD_FILES to ghadoop_env.sh which ghadoop will use to upload
     a list of relative or absolute file paths to every VM before starting
     execution of COMMAND_STEPS.
  4. Include full Hive and Pig sampleapp from Cloud Solutions with ghadoop;
     added extensions/querytools/querytools_env.sh to auto-install Hive and
     Pig as part of deployment. Usage:
         ./ghadoop deploy extensions/querytools/querytools_env.sh


0.31.1 - 2014-01-23

  1. Added CHANGES.txt for release notes.
  2. Switched from /hadoop/temp to /hadoop/tmp.
  3. Added support for running ghadoop as root; will display additional
     confirmation prompt before starting.
  4. run_gcutil_cmd() now displays the full copy/paste-able gcutil command.
  5. Now, only the public key of the master-generated ssh keypair is copied
     into GCS during setup and onto the datanodes. This fixes the occasional
     failed deployment due to GCS list-consistency, and is cleaner anyways.
     The ssh keypair is now more descriptively named: 'hadoop_master_id_rsa'.
  6. Added check for sshability from master to workers in start_hadoop.sh.
  7. Cleaned up internal gcutil commands, added printout of full command
     to ssh into the namenode at the end of the deployment.
  8. Removed indirect config references from *-site.xml.
  9. Stopped explicitly setting mapred.*.dir.


0.31.0 - 2014-01-14

  1. Preview release of ghadoop.
