StreamlinerCore

StreamlinerCore is a julia library to generate, train and evaluate models defined via some configuration files.

Data interface

StreamlinerCore.AbstractData — Type

AbstactData{N}

Abstract type representing streamers of N datasets. In general, StreamlinerCore will use N = 1 to validate and evaluate trained models and N = 2 to train models via a training and a validation datasets.

Subtypes of AbstractData are meant to implement the following methods:

stream,
get_templates,
get_metadata,
get_nsamples.

source

StreamlinerCore.stream — Function

stream(f, data::AbstractData, partition::Integer, streaming::Streaming)

Stream partition of data by batches of batchsize on a given device. Return the result of applying f on the resulting batch iterator. Shuffling is optional and controlled by shuffle (boolean) and by the random number generator rng.

The options device, batchsize, shuffle, rng are passed via the configuration struct streaming::Streaming. See also Streaming.

source

StreamlinerCore.ingest — Function

ingest(data::AbstractData{1}, eval_stream, select)

Ingest output of evaluate into a suitable database, tensor or iterator. select determines which fields of the model output to keep.

source

StreamlinerCore.get_templates — Function

get_templates(data::AbstractData)

Extract templates for data. Templates encode type and size of the arrays that data will stream. See also Template

source

StreamlinerCore.get_metadata — Function

get_metadata(x)::Dict{String, Any}

Extract metadata for x. metadata should be a dictionary of information that identifies x univoquely. get_metadata has methods for AbstractData, Model, and Training.

source

StreamlinerCore.get_nsamples — Function

get_nsamples(data::AbstractData{N})::NTuple{N, Int} where {N}

Return number of samples for data.

source

StreamlinerCore.Template — Type

Template(::Type{T}, size::NTuple{N, Int}) where {T, N}

Create an object of type Template. It represents arrays with eltype T and size size. Note that size does not include the minibatch dimension.

source

Parser

StreamlinerCore.Parser — Type

Parser(;
    model, layers, sigmas, aggregators, metrics, regularizations,
    optimizers, schedules, stoppers, devices
)

Collection of dictionaries to performance the necessary conversion from the user-specified configuration file or dictionary to julia objects.

For most usecases, one should define a default parser

parser = default_parser()

and pass it to Model and Training upon construction.

A parser object is also required to use interface functions that read from the MongoDB:

finetune,
loadmodel,
validate,
evaluate.

See default_parser for more advanced uses.

source

StreamlinerCore.default_parser — Function

default_parser(; plugins::AbstractVector{Parser}=Parser[])

Return a parser::Parser object that includes StreamlinerCore defaults together with optional plugins.

source

Parsed objects

StreamlinerCore.Model — Type

Model(parser::Parser, metadata::AbstractDict)

Model(parser::Parser, path::AbstractString, [vars::AbstractDict])

Create a Model object from a configuration dictionary metadata or, alternatively, from a configuration dictionary stored at path in TOML format. The optional argument vars is a dictionary of variables the can be used to fill the template given in path.

The parser::Parser handles conversion from configuration variables to julia objects.

Given a model::Model object, use model(data) where data::AbstractData to instantiate the corresponding neural network or machine.

source

StreamlinerCore.Training — Type

Training(parser::Parser, metadata::AbstractDict)

Training(parser::Parser, path::AbstractString, [vars::AbstractDict])

Create a Training object from a configuration dictionary metadata or, alternatively, from a configuration dictionary stored at path in TOML format. The optional argument vars is a dictionary of variables the can be used to fill the template given in path.

The parser::Parser handles conversion from configuration variables to julia objects.

source

StreamlinerCore.Streaming — Type

Streaming(parser::Parser, metadata::AbstractDict)

Streaming(parser::Parser, path::AbstractString, [vars::AbstractDict])

Create a Streaming object from a configuration dictionary metadata or, alternatively, from a configuration dictionary stored at path in TOML format. The optional argument vars is a dictionary of variables the can be used to fill the template given in path.

The parser::Parser handles conversion from configuration variables to julia objects.

source

Training and evaluation

StreamlinerCore.Result — Type

@kwdef struct Result{P}
    iteration::Int
    iterations::Int
    stats::NTuple{P, Vector{Float64}}
    trained::Bool
    resumed::Maybe{Bool} = nothing
    successful::Maybe{Bool} = nothing
end

Structure to encode the result of train, finetune, or validate. Stores configuration of model, metrics, and information on the location of the model weights.

source

StreamlinerCore.has_weights — Function

has_weights(result::Result)

Return true if result is a successful training result, false otherwise.

source

StreamlinerCore.train — Function

train(
    dir::AbstractString,
    model::Model, data::AbstractData{2}, training::Training;
    callback = default_callback
)

Train model using the training configuration on data. Save the resulting weights in dir.

After every epoch, callback(m, trace).

The arguments of callback work as follows.

m is the instantiated neural network or machine,
trace is an object encoding additional information, i.e.,
- stats (average of metrics computed so far),
- metrics (functions used to compute stats), and
- iteration.

source

StreamlinerCore.finetune — Function

finetune(
    (src, dst)::Pair,
    model::Model, data::AbstractData{2}, training::Training;
    init::Maybe{Result} = nothing, callback = default_callback
)

Load model encoded in model from src and retrain it using the training configuration on data. Save the resulting weights in dst.

Use init = result::Result to restart training where it left off. The callback keyword argument works as in train.

source

StreamlinerCore.loadmodel — Function

loadmodel(model::Model, data::AbstractData, device)

Load model encoded in model on the device. The object data is required as the model can only be initialized once the data dimensions are known.

source

loadmodel(dirname::AbstractString, model::Model, data::AbstractData, device)

Load model encoded in result on the device. The object data is required as the model can only be initialized once the data dimensions are known.

source

StreamlinerCore.validate — Function

validate(
    dir::AbstractString,
    model::Model,
    data::AbstractData{1},
    streaming::Streaming
)

Load model with weights saved in dir and validate it on data using streaming settings streaming.

source

StreamlinerCore.evaluate — Function

evaluate(
        device_m, data::AbstractData{1}, streaming::Streaming,
        select::SymbolTuple = (:prediction,)
    )

Evaluate model device_m on data using streaming settings streaming.

source

evaluate(
    dirname::AbstractString,
    model::Model, data::AbstractData{1}, streaming::Streaming,
    select::SymbolTuple = (:prediction,)
)

Load model with weights saved in dirname and evaluate it on data using streaming settings streaming.

source

StreamlinerCore.summarize — Function

summarize(io::IO, model::Model, data::AbstractData, training::Training)

Display summary information concerning model (structure and number of parameters) and data (number of batches and size of each batch).

source