Contents
This sample demonstrates the use of the StreamBase® Regular Expression File Reader for Apache Hadoop Distributed File System (HDFS) in a StreamBase application that processes a text log file, storing the information extracted from the log in a query table. The log file is typical of a server log: it contains information about user logins, as well as other, spurious information. The format of the log file means it is not particularly suited to direct use in an application. It does not, for example, consist of CSV records that are ready to be turned into tuples; it must be parsed first and useful information extracted. Because the log file is text- and line-oriented, it is well suited to parsing using regular expressions.
Keep in mind that it is difficult to observe all log reading activities. The input adapter begins reading the input file and outputting tuples as soon as the application starts, before StreamBase Studio or external dequeuers have time to connect to the output of the reader. However, the application itself is fully functional, and all tuples read from the input file will be present in the query table.
You must open the RegexReader.sbapp
file in the
src/main/eventflow/
folder. Select the Parameters tab
and edit the value to represent both your current HDFS setup and where you would like
to store the sample data.
packageName
The samplelog.txt
file used in the sample must be
placed on your HDFS file system in the location you specified in the Parameters tab
before this sample can run.
In StreamBase Studio, import this sample with the following steps:
-
From the top-level menu, click
> . -
Enter
hdfs
to narrow the list of options. -
Select HDFS Regular expression file input adapter from the Large Data Storage and Analysis category.
-
Click
.
StreamBase Studio creates a project for the sample.
-
In the Project Explorer view, open the sample you just loaded.
If you see red marks on a project folder, wait a moment for the project to load its features.
If the red marks do not resolve themselves after a minute, select the project, right-click, and select
> from the context menu. -
Open the
src/main/eventflow/
folder.packageName
-
Open the
RegexReader.sbapp
file and click the Run button. This opens the SB Test/Debug perspective and starts the module. -
Select the Manual Input tab.
-
Enter
joe
,fred
,bob
, ormax
for Username, and click . -
The Output Streams view shows the time and IP address for the last queried user, according to the log.
-
When done, press F9 or click the Terminate EventFlow Fragment button.
When you load the sample into StreamBase® Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.
Important
Load this sample in StreamBase® Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.
Using the workspace copy of the sample avoids permission problems. The default workspace location for this sample is:
studio-workspace
/sample_adapter_embedded_hdfsregexreader
See Default Installation
Directories for the default location of studio-workspace
on your system.