Title: | Train and Predict Cancer Subtype with Keras Model based on Mutational Signatures |
---|---|
Description: | Mutational signatures represent mutational processes occured in cancer evolution, thus are stable and genetic resources for subtyping. This tool provides functions for training neutral network models to predict the subtype a sample belongs to based on 'keras' and 'sigminer' packages. |
Authors: | Shixiang Wang [aut, cre] |
Maintainer: | Shixiang Wang <[email protected]> |
License: | Apache License (>= 2.0) |
Version: | 0.2.0 |
Built: | 2024-11-14 06:01:55 UTC |
Source: | https://github.com/ShixiangWang/sigminer.prediction |
Construct A Batch of Keras Models
batch_modeling_and_fitting(data_list, param_combination, ...)
batch_modeling_and_fitting(data_list, param_combination, ...)
data_list |
A |
param_combination |
A parameter |
... |
Other arguments passing to modeling_and_fitting. |
a tibble
.
load(system.file("extdata", "wang2020-input.RData", package = "sigminer.prediction", mustWork = TRUE )) dat_list <- prepare_data(expo_all, col_to_vars = c(paste0("Sig", 1:5), paste0("AbsSig", 1:5)), col_to_label = "enrich_sig", label_names = paste0("Sig", 1:5) ) pc <- expand.grid( c(10, 20, 50, 100), c(0, 0.1, 0.2, 0.3, 0.4, 0.5), c(10, 20, 50, 100), c(0, 0.1, 0.2, 0.3, 0.4, 0.5) ) colnames(pc) <- c( "first_layer_unit", "second_layer_drop_rate", "third_layer_unit", "fourth_layer_drop_rate" ) # Just use 2 rows for illustration batch_res <- batch_modeling_and_fitting(dat_list, param_combination = pc %>% head(2)) batch_res tidy(batch_res)
load(system.file("extdata", "wang2020-input.RData", package = "sigminer.prediction", mustWork = TRUE )) dat_list <- prepare_data(expo_all, col_to_vars = c(paste0("Sig", 1:5), paste0("AbsSig", 1:5)), col_to_label = "enrich_sig", label_names = paste0("Sig", 1:5) ) pc <- expand.grid( c(10, 20, 50, 100), c(0, 0.1, 0.2, 0.3, 0.4, 0.5), c(10, 20, 50, 100), c(0, 0.1, 0.2, 0.3, 0.4, 0.5) ) colnames(pc) <- c( "first_layer_unit", "second_layer_drop_rate", "third_layer_unit", "fourth_layer_drop_rate" ) # Just use 2 rows for illustration batch_res <- batch_modeling_and_fitting(dat_list, param_combination = pc %>% head(2)) batch_res tidy(batch_res)
It is usefully when your result model file is stored in temp directory and you want to keep it.
copy_model(model_file, dest)
copy_model(model_file, dest)
model_file |
A file path to the model file. |
dest |
The destination file path. |
Nothing
List Current Available Trained Keras Models
list_trained_models()
list_trained_models()
A tibble
containing summary models.
list_trained_models()
list_trained_models()
Load Trained Models
load_trained_model(x)
load_trained_model(x)
x |
A subset from list_trained_models. |
A (list of) Keras model.
z <- list_trained_models() %>% head(1) %>% load_trained_model() z
z <- list_trained_models() %>% head(1) %>% load_trained_model() z
Create 5-layer Keras Model and Fitting Datasets
modeling_and_fitting( data_list, first_layer_unit, second_layer_drop_rate, third_layer_unit, fourth_layer_drop_rate, epochs = 30, batch_size = 16, validation_split = 0.2, validation_data = NULL, test_split = NULL, first_layer_activation = "relu", third_layer_activation = "relu", fifth_layer_activation = "softmax", loss = "categorical_crossentropy", optimizer = optimizer_rmsprop(), metrics = c("accuracy"), model_file = tempfile(pattern = "keras_model", tmpdir = file.path(tempdir(), "sigminer.pred"), fileext = ".h5"), test_mode = FALSE )
modeling_and_fitting( data_list, first_layer_unit, second_layer_drop_rate, third_layer_unit, fourth_layer_drop_rate, epochs = 30, batch_size = 16, validation_split = 0.2, validation_data = NULL, test_split = NULL, first_layer_activation = "relu", third_layer_activation = "relu", fifth_layer_activation = "softmax", loss = "categorical_crossentropy", optimizer = optimizer_rmsprop(), metrics = c("accuracy"), model_file = tempfile(pattern = "keras_model", tmpdir = file.path(tempdir(), "sigminer.pred"), fileext = ".h5"), test_mode = FALSE )
data_list |
A |
first_layer_unit |
Positive integer, dimensionality of the output space for the first layer. |
second_layer_drop_rate |
Float between 0 and 1. Fraction of the input units to drop for the second layer. |
third_layer_unit |
Positive integer, dimensionality of the output space for the third layer. |
fourth_layer_drop_rate |
Float between 0 and 1. Fraction of the input units to drop for the fourth layer. |
epochs |
Number of epochs to train the model, default is |
batch_size |
Integer or NULL. Number of samples per gradient update. If unspecified, batch_size will default to |
validation_split |
Float between 0 and 1. Fraction of the training data
to be used as validation data. The model will set apart this fraction of
the training data, will not train on it, and will evaluate the loss and any
model metrics on this data at the end of each epoch. The validation data
is selected from the last samples in the |
validation_data |
Data on which to evaluate the loss and any model
metrics at the end of each epoch. The model will not be trained on this
data. This could be a list (x_val, y_val) or a list (x_val, y_val,
val_sample_weights). |
test_split |
Float between 0 and 1. Fraction of the all data to be used as test data. If not set, it will be auto-calculated from input data. This value is used for calculating total accuracy. |
first_layer_activation |
activation function for the first layer, default is "relu". |
third_layer_activation |
activation function for the third layer, default is "relu". |
fifth_layer_activation |
activation function for the fifth layer, default is "softmax". |
loss |
String (name of objective function), objective function or a
|
optimizer |
String (name of optimizer) or optimizer instance. For most
models, this defaults to |
metrics |
List of metrics to be evaluated by the model during training
and testing. Each of this can be a string (name of a built-in function),
function or a |
model_file |
file path to save the model file in |
test_mode |
Default is |
a tibble
.
load(system.file("extdata", "wang2020-input.RData", package = "sigminer.prediction", mustWork = TRUE )) dat_list <- prepare_data(expo_all, col_to_vars = c(paste0("Sig", 1:5), paste0("AbsSig", 1:5)), col_to_label = "enrich_sig", label_names = paste0("Sig", 1:5) ) res <- modeling_and_fitting(dat_list, 20, 0, 20, 0.1) res$history[[1]] %>% plot() ## Load model and predict model <- load_model_hdf5(res$model_file) model %>% predict_classes(dat_list$x_train[1, , drop = FALSE]) model %>% predict_proba(dat_list$x_train[1, , drop = FALSE])
load(system.file("extdata", "wang2020-input.RData", package = "sigminer.prediction", mustWork = TRUE )) dat_list <- prepare_data(expo_all, col_to_vars = c(paste0("Sig", 1:5), paste0("AbsSig", 1:5)), col_to_label = "enrich_sig", label_names = paste0("Sig", 1:5) ) res <- modeling_and_fitting(dat_list, 20, 0, 20, 0.1) res$history[[1]] %>% plot() ## Load model and predict model <- load_model_hdf5(res$model_file) model %>% predict_classes(dat_list$x_train[1, , drop = FALSE]) model %>% predict_proba(dat_list$x_train[1, , drop = FALSE])
Prepare Training and Test Dataset
prepare_data( data, col_to_vars, col_to_label, label_names, seed = 1234, test_split = 0.2 )
prepare_data( data, col_to_vars, col_to_label, label_names, seed = 1234, test_split = 0.2 )
data |
A |
col_to_vars |
A character vector specifying the predictive columns. |
col_to_label |
A column indicating the labels/classes. |
label_names |
Label/class names. The order is important. For example, "a", "b", "c" will be transformed to 0, 1, 2. |
seed |
Random seed, default is |
test_split |
A fraction of samples to treated as test dataset, default is |
a list
containing x_train
, y_train
, x_test
, y_test
datasets.
load(system.file("extdata", "wang2020-input.RData", package = "sigminer.prediction", mustWork = TRUE )) dat_list <- prepare_data(expo_all, col_to_vars = c(paste0("Sig", 1:5), paste0("AbsSig", 1:5)), col_to_label = "enrich_sig", label_names = paste0("Sig", 1:5) ) str(dat_list)
load(system.file("extdata", "wang2020-input.RData", package = "sigminer.prediction", mustWork = TRUE )) dat_list <- prepare_data(expo_all, col_to_vars = c(paste0("Sig", 1:5), paste0("AbsSig", 1:5)), col_to_label = "enrich_sig", label_names = paste0("Sig", 1:5) ) str(dat_list)
Tidy Modeling Result
tidy(x)
tidy(x)
x |
A result |
a tibble