Custom Functions with Complex Data Types

This topic describes how to create custom functions that take or return a complex StreamBase data type such as list or tuple. There are two cases:

  • For custom C++ functions, you must register the argument or return type in the <custom-function> section of a StreamBase Server configuration file.

  • For custom Java functions, you must either:

    • Create a custom resolver annotation and class.

    • Register with a <custom-function> section of the configuration file as is done for C++ functions.

Resolving Complex Data Types for C++ Functions

Custom C++ functions must be declared in a <custom-function> element of a StreamBase Server configuration file. The custom function's argument and return data types are specified using <arg> and <return> elements.

If the type attribute of these elements is list or tuple, the element must have a child element that further describes the data type. For tuple types, this is a <schema> element. For list types, this is an <element-type> element. For example:

<custom-functions>
  <custom-function name="DownWithFunc" 
    type="simple" alias="downwid">
    <args>
      <arg type="tuple">
        <schema>
          <field name="f1" type="int" />
          <field name="f2" type="double" />
        </schema>
      </arg>
      <arg type="list">
        <element-type type="string" />
      </arg>
    </args>
    <return type="list">
      <element-type type="tuple">
        <schema>
          <field name="g1" type="blob" />
          <field name="g2" type="timestamp" />
        </schema>
      </element-type>
    </return>
  </custom-function>
</custom-functions>

The example above describes the arguments to a function that takes:

  • A tuple whose fields are an int and a double

  • A list of strings

The example returns:

  • A list of tuples whose fields are a blob and a timestamp

See <custom-functions> for details on the <custom-function>, <args>, <return>, <arg>, <schema>, <field>, and <element-type> elements of the server configuration file.

Resolving Complex Data Types for Java Functions

Custom Java functions can take or return a complex data type such as list or tuple. There are two ways you can make use of such functions in a StreamBase application:

Java Functions: Using Configuration File Elements

Java functions do not need to have <custom-function> elements in the server configuration file if the functions will only be called by means of the calljava() function. When used, <custom-function> elements are typically constructed with the args="auto" and alias= attributes for the purpose of defining an alias for the function — which avoids having to use calljava() to invoke the function.

If a Java function takes or returns a complex data type, and you do not want to use a custom function resolver, then you must specify a <custom-function> element for the function, and must not use the args="auto" attribute. You must then specify <arg> and <return> elements to describe each argument and return type, using the syntax described above for C++ functions.

Use the configuration file definition method only when your list arguments or returns have a single, non-changing list element type, such as an argument of type list(string) or list(double), and so on. If your function is designed to take lists of several types (any of list(int), list(long), or list(double), for example), or if your function returns a list whose element type is not known until the function runs, then use a custom function resolver.

There are cases where it is preferable to use <arg type=> settings instead of a custom function resolver. For example, your function might have non-StreamBase uses, and might have many overloaded argument signatures for its several use cases. If you only want to expose a subset of those signatures for use in your StreamBase application, you can specify exactly the arguments you want to support using explicit <arg type=> entries.

See <custom-functions> for details on the <custom-function>, <args>, <return>, <arg>, <schema>, <field>, and <element-type> elements of the server configuration file.

Java Functions: Using Custom Function Resolvers

This section discusses the following subjects:

Purpose of a Custom Function Resolver

When integrating a custom Java function with StreamBase, the server tries to automatically map between StreamBase data types and their Java equivalent. For example, StreamBase int is mapped to Java int, while a StreamBase string is mapped to a to Java String.

When it comes to complex data types, the automatic mapping cannot occur. Even though the natural representation for a list(int) or list(string) in Java is a List<Integer> or List<String>, respectively, at the JVM level, these types don't exist. Both types are reduced to an untyped List after Java type erasure. For this reason, the server cannot automatically infer the type of an argument or return for parameterized data types.

The solution is to provide a custom function resolver method associated with the custom Java function. The custom function resolver method has the following purposes:

  • To determine whether it is valid to call the custom function with arguments of a particular type.

  • To specify the type of the function's result.

Writing a Custom Function Resolver

Custom function resolvers are called at build time for an application that refers to custom functions with associated resolvers.

A custom function's resolver method is specified using the Java CustomFunctionResolver annotation. For example:

@CustomFunctionResolver("myCustomFnResolver")
public static List myCustomFn(int i, Tuple t) {
...
}

public static CompleteDataType myCustomFnResolver(CompleteDataType arg1, CompleteDataType arg2) {
...
}

Both the annotation and the stub of a resolver class are generated for you when you use the New StreamBase Java Functions wizard in Studio to generate function code.

A resolver takes as many arguments as the method that refers to it. The types of all of these arguments must be CompleteDataType. The return type of the resolver must also be CompleteDataType.

For custom simple functions, a resolver can return either:

  • null, to indicate that the custom function cannot be called with arguments whose types are specified as the parameters to the resolver.

  • A valid CompleteDataType that characterizes the type of value returned by the custom function.

For custom simple functions that return simple data types (that is, the Java equivalents of the simple StreamBase types blob, bool, double, int, long, string, and timestamp), the returned CompleteDataType must be the appropriate corresponding simple type. For example, the return for an int must be as returned by CompleteDataType.forInt()).

For custom simple functions that return complex data types (that is, the Java equivalents of the complex StreamBase types tuple and list), the returned CompleteDataType must fully describe the appropriate complex type (as returned by CompleteDataType.forTuple(Schema) or CompleteDataType.forList(CompleteDataType)).

For custom aggregate functions, only the accumulate() methods can have associated resolvers. Even though the accumulate() methods return no value, the return type of their resolvers (if present) must be a CompleteDataType. If a resolver returns a non-null value, the value is taken to represent the return type of the custom aggregate function as a whole, when used with arguments of the type specified by the chosen accumulate() method.

See the Javadoc for CustomFunctionResolver for more information.

Custom Function Resolver Example

Consider the following example:

class ExprUtil {
  @CustomFunctionResolver("concatenateResolver")
  public static List<?> concatenate(List<?> v1, List<?> v2) {
    List<?> res = new ArrayList<?>();
    if (v1 != null) { res.addAll(v1); }
    if (v2 != null) { res.addAll(v2); }
    return res;
  }

  public static CompleteDataType concatenateResolver(
        CompleteDataType arg1, CompleteDataType arg2) {
    if (arg1.getElementType() == null) {
      return null; // not a list
    }
    if (arg1.equals(arg2)) { return arg1; }
    return null;
  }
}

The concatenate() function defined in this example takes two list arguments, as long as both lists have the same element type. The function returns a list of the same element type.

We use the @CustomFunctionResolver to annotate the concatenate() method with a function resolver, which is the concatenateResolver() method. At typecheck time, the resolver is called as necessary to determine whether the function's arguments are consistent with the definition of the function. Then at runtime, the concatenate() function is called appropriately.