Scala for Machine Learning
上QQ阅读APP看书,第一时间看更新

Time series

The overwhelming majority of examples used to illustrate the different machine algorithms in this book process time series or sequential, ordered, or unordered data.

Each library has its own container type to manipulate datasets. The challenge is to define all possible conversions between types from different libraries needed to implement a large variety of machine learning models. Such a strategy may result in a combinatorial explosion of implicit conversion. A solution consists of creating a generic class to manage conversion from and to any type used by a third-party library.

Note

Scala.collection.JavaConversions _

Scala provides a standard package to convert collection types from Scala to Java and vice versa.

The generic data transformation, DT, can be used to transform any XTSeries time series:

class DT[T,U] extends PipeOperator[XTSeries[T], XTSeries[U]] {
  override def |> : PartialFunction[XTSeries[T], XTSeries[U]]
}

Let's consider the simple case of using a Java library, the Apache Commons Math framework, and JFreeChart for visualization, and define a parameterized time series class, XTSeries[T]. The \> data transformation converts a time series of values of type T, XTSeries[T], into a time series of values of type U, XTSeries[U]. The following diagram provides an overview of type conversion in data transformation:

Let's create the XTSeries class. As a container, the class should be an implementation of the Scala higher-order collections functions such as map, foreach, or zip. The class should support at least conversion to DblVector and DblMatrix types introduced in the first chapter.

Here is a partial implementation of the XTSeries class. Comments, exceptions, argument validations, and debugging code are omitted in the code:

class XTSeries[T](label: String, arr: Array[T]) { // 1
  def apply(n: Int): T = arr.apply(n)

  @implicitNotFound("Undefined conversion to DblVector") // 2
  def toDblVector(implicit f: T=>Double):DblVector =arr.map(f(_))

  @implicitNotFound("Undefined conversion to DblMatrix") // 2
  def toDblMatrix(implicit fv: T => DblVector): DblMatrix = arr.map( fv( _ ) )

  def + (n: Int, t: T)(implicit f: (T,T) => T): T = f(arr(n), t)

  def head: T = arr.head  //3
  def drop(n: Int):XTSeries[T] = XTSeries(label,arr.drop(n))
  def map[U: ClassTag](f: T => U): XTSeries[U] = XTSeries[U](label, arr.map( x =>f(x)))
  def foreach( f: T => Unit) = arr.foreach(f) //3
  def sortWith(lt: (T,T)=>Boolean):XTSeries[T] = XTSeries[T](label, arr.sortWith(lt))
  def max(implicit cmp: Ordering[T]): T = arr.max //4
def min(implicit cmp: Ordering[T]): T = arr.min
…
}

The class takes an optional label and an invariant array of the parameterized type T. The annotation @specialized (line 1) instructs the compiler to generate two versions of the class:

  • A generic XTSeries[T] class that exploits all the implicit conversions required to perform operations on time series of a generic type
  • An optimized XTSeries[Double] class that bypasses the conversion and offers the client code with a faster implementation

The conversion to DblVector (resp. DblMatrix) relies on the implicit conversion of elements to type Double (resp. DblVector) (line 2). The @implicitNotFound annotation instructs the compiler to omit an error if no implicit conversion is detected. The conversion methods are used to implement the implicit conversion introduced in the previous section. These methods are defined in the singleton org.scalaml.core.Types.CommonsMath library. The following code shows the implementation of the conversion methods:

object Types {
   object CommonMath {
 implicit def series2DblVector[T](xt: XTSeries[T])(implicit f: T=>Double):DblVector = xt.toDblVector(f)
 implicit def series2DblMatrix[T](xt: XTSeries[T])(implicit f: T=>DblVector): DblMatrix = xt.toDblMatrix(f)

}

This code snippet exposes a subset of the Scala higher-order collections methods (line 3) applied to the time series. The computation of the minimum and maximum values in the time series required that the cmp ordering/compare method be defined for the elements of the type T (line 4).

Let's put our versatile XTSeries class to use in creating a basic preprocessing data transformation starting with the ubiquitous moving average techniques.