Amazon S3 File System Access via HDFS Adapters

About The Samples

These samples illustrate how to access the Amazon S3 file system using the StreamBase® File Writer for Apache Hadoop Distributed File System (HDFS) and StreamBase® File Reader for Apache Hadoop Distributed File System (HDFS).

Importing This Sample into StreamBase Studio

In StreamBase Studio, import this sample with the following steps:

  • From the top-level menu, select File>Import Samples and Community Content.

  • Enter s3 to narrow the list of options.

  • Select S3 file system access via HDFS from the Large Data Storage and Analysis category.

  • Click Import Now.

StreamBase Studio creates a single project containing the sample files.

Initial Setup

Open the sample application, FileBasic.sbapp or FileAdvanced.sbapp, select the Parameters tab, and edit the value to represent:

  • Your current S3 bucket

  • Where you would like to store the sample data

You must also open the sample S3 configuration file, sample_s3a.xml, and enter your security access keys for your S3 file system. See this link for other ways to authenticate with S3: Authenticating with S3.

Running The Basic Sample in StreamBase Studio

  1. In the Project Explorer view, open this sample's folder.

    Keep an eye on the bottom right status bar of the Studio window. Make sure any Updating, Downloading, Building, or Rebuild project messages finish before you proceed.

  2. Open the src/main/eventflow/packageName folder.

  3. Double-click to open the FileBasic.sbapp module. Make sure the module is the currently active tab in the EventFlow Editor.

  4. Click the Run button. This opens the SB Test/Debug perspective and starts the module.

  5. Wait for the Waiting for fragment to initialize message to clear.

  6. In the Manual Input view, switch the Stream to Data, then enter a string value such as test.

  7. Click Send Data to send a data tuple to be written to the file. Repeat for as many lines as you wish.

  8. In the Output Streams view, observe tuples emitted on the Status output streams indicating actions performed on the file.

  9. In the Manual Input view, switch the Stream to WriteControl, then enter Close into the Command field.

  10. Click Send Data to send a control tuple, which closes the current file for writing.

  11. In the Manual Input view, switch the Stream to ReadControl, then click Send Data to send a control tuple, which reads the default file.

  12. Press F9 or click the Terminate EventFlow Fragment button.

  13. This demo has now created a file in your S3 file system called sample/Sample.txt containing the lines of data you submitted.

Running The Advanced Sample in StreamBase Studio

  1. In the Project Explorer view, open this sample's folder.

    Keep an eye on the bottom right status bar of the Studio window. Make sure any Updating, Downloading, Building, or Rebuild project messages finish before you proceed.

  2. Open the src/main/eventflow/packageName folder.

  3. Double-click to open the FileAdvanced.sbapp module. Make sure the module is the currently active tab in the EventFlow Editor.

  4. Click the Run button. This opens the SB Test/Debug perspective and starts the module.

  5. Wait for the Waiting for fragment to initialize message to clear.

  6. In the Output Streams view, observe tuples emitted on the Status output streams indicating actions performed to the files.

  7. Press F9 or click the Terminate EventFlow Fragment button.

  8. This demo has now created multiple files in your S3 file system:

    1. sample/Sample.gz — This file is a GZip compressed file created from the SampleIn.txt file.

    2. sample/Sample.gz2 — This file is a BZip2 compressed file created from the SampleIn.txt file.

    3. sample/Sample.zip — This file is a Zip compressed file created from the SampleIn.txt file.

    4. sample/SampleOut.txt — This file is an uncompressed file created from the SampleIn.txt file.

Sample Location

When you load the sample into StreamBase® Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.

Important

Load this sample in StreamBase® Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.

Using the workspace copy of the sample avoids permission problems. The default workspace location for this sample is:

studio-workspace/sample_adapter_embedded_hdfsS3

See Default Installation Directories for the default location of studio-workspace on your system.