StreamlinerCore
StreamlinerCore is a julia library to generate, train and evaluate models defined via some configuration files.
Data interface
StreamlinerCore.AbstractData
— TypeAbstactData{N}
Abstract type representing streamers of N
datasets. In general, StreamlinerCore will use N = 1
to validate and evaluate trained models and N = 2
to train models via a training and a validation datasets.
Subtypes of AbstractData
are meant to implement the following methods:
StreamlinerCore.stream
— Functionstream(f, data::AbstractData, partition::Integer, streaming::Streaming)
Stream partition
of data
by batches of batchsize
on a given device
. Return the result of applying f
on the resulting batch iterator. Shuffling is optional and controlled by shuffle
(boolean) and by the random number generator rng
.
The options device
, batchsize
, shuffle
, rng
are passed via the configuration struct streaming::Streaming
. See also Streaming
.
StreamlinerCore.ingest
— Functioningest(data::AbstractData{1}, eval_stream, select)
Ingest output of evaluate
into a suitable database, tensor or iterator. select
determines which fields of the model output to keep.
StreamlinerCore.get_templates
— Functionget_templates(data::AbstractData)
Extract templates for data
. Templates encode type and size of the arrays that data
will stream
. See also Template
StreamlinerCore.get_metadata
— Functionget_metadata(x)::Dict{String, Any}
Extract metadata for x
. metadata
should be a dictionary of information that identifies x
univoquely. get_metadata
has methods for AbstractData
, Model
, and Training
.
StreamlinerCore.get_nsamples
— Functionget_nsamples(data::AbstractData{N})::NTuple{N, Int} where {N}
Return number of samples for data
.
StreamlinerCore.Template
— TypeTemplate(::Type{T}, size::NTuple{N, Int}) where {T, N}
Create an object of type Template
. It represents arrays with eltype T
and size size
. Note that size
does not include the minibatch dimension.
Parser
StreamlinerCore.Parser
— TypeParser(;
model, layers, sigmas, aggregators, metrics, regularizations,
optimizers, schedules, stoppers, devices
)
Collection of dictionaries to performance the necessary conversion from the user-specified configuration file or dictionary to julia objects.
For most usecases, one should define a default parser
parser = default_parser()
and pass it to Model
and Training
upon construction.
A parser
object is also required to use interface functions that read from the MongoDB:
See default_parser
for more advanced uses.
StreamlinerCore.default_parser
— Functiondefault_parser(; plugins::AbstractVector{Parser}=Parser[])
Return a parser::
Parser
object that includes StreamlinerCore defaults together with optional plugins
.
Parsed objects
StreamlinerCore.Model
— TypeModel(parser::Parser, metadata::AbstractDict)
Model(parser::Parser, path::AbstractString, [vars::AbstractDict])
Create a Model
object from a configuration dictionary metadata
or, alternatively, from a configuration dictionary stored at path
in TOML format. The optional argument vars
is a dictionary of variables the can be used to fill the template given in path
.
The parser::
Parser
handles conversion from configuration variables to julia objects.
Given a model::Model
object, use model(data)
where data::
AbstractData
to instantiate the corresponding neural network or machine.
StreamlinerCore.Training
— TypeTraining(parser::Parser, metadata::AbstractDict)
Training(parser::Parser, path::AbstractString, [vars::AbstractDict])
Create a Training
object from a configuration dictionary metadata
or, alternatively, from a configuration dictionary stored at path
in TOML format. The optional argument vars
is a dictionary of variables the can be used to fill the template given in path
.
The parser::
Parser
handles conversion from configuration variables to julia objects.
StreamlinerCore.Streaming
— TypeStreaming(parser::Parser, metadata::AbstractDict)
Streaming(parser::Parser, path::AbstractString, [vars::AbstractDict])
Create a Streaming
object from a configuration dictionary metadata
or, alternatively, from a configuration dictionary stored at path
in TOML format. The optional argument vars
is a dictionary of variables the can be used to fill the template given in path
.
The parser::
Parser
handles conversion from configuration variables to julia objects.
Training and evaluation
StreamlinerCore.Result
— Type@kwdef struct Result{P}
iteration::Int
iterations::Int
stats::NTuple{P, Vector{Float64}}
trained::Bool
resumed::Maybe{Bool} = nothing
successful::Maybe{Bool} = nothing
end
Structure to encode the result of train
, finetune
, or validate
. Stores configuration of model, metrics, and information on the location of the model weights.
StreamlinerCore.has_weights
— Functionhas_weights(result::Result)
Return true
if result
is a successful training result, false
otherwise.
StreamlinerCore.train
— Functiontrain(
dir::AbstractString,
model::Model, data::AbstractData{2}, training::Training;
callback = default_callback
)
Train model
using the training
configuration on data
. Save the resulting weights in dir
.
After every epoch, callback(m, trace)
.
The arguments of callback
work as follows.
m
is the instantiated neural network or machine,trace
is an object encoding additional information, i.e.,stats
(average of metrics computed so far),metrics
(functions used to computestats
), anditeration
.
StreamlinerCore.finetune
— Functionfinetune(
(src, dst)::Pair,
model::Model, data::AbstractData{2}, training::Training;
init::Maybe{Result} = nothing, callback = default_callback
)
Load model encoded in model
from src
and retrain it using the training
configuration on data
. Save the resulting weights in dst
.
Use init = result::Result
to restart training where it left off. The callback
keyword argument works as in train
.
StreamlinerCore.loadmodel
— Functionloadmodel(model::Model, data::AbstractData, device)
Load model encoded in model
on the device
. The object data
is required as the model can only be initialized once the data dimensions are known.
loadmodel(dirname::AbstractString, model::Model, data::AbstractData, device)
Load model encoded in result
on the device
. The object data
is required as the model can only be initialized once the data dimensions are known.
StreamlinerCore.validate
— Functionvalidate(
dir::AbstractString,
model::Model,
data::AbstractData{1},
streaming::Streaming
)
Load model
with weights saved in dir
and validate it on data
using streaming settings streaming
.
StreamlinerCore.evaluate
— Functionevaluate(
device_m, data::AbstractData{1}, streaming::Streaming,
select::SymbolTuple = (:prediction,)
)
Evaluate model device_m
on data
using streaming settings streaming
.
evaluate(
dirname::AbstractString,
model::Model, data::AbstractData{1}, streaming::Streaming,
select::SymbolTuple = (:prediction,)
)
Load model
with weights saved in dirname
and evaluate it on data
using streaming settings streaming
.
StreamlinerCore.summarize
— Functionsummarize(io::IO, model::Model, data::AbstractData, training::Training)
Display summary information concerning model (structure and number of parameters) and data (number of batches and size of each batch).