Title: | Location and Scale Invariant Power Transformations |
---|---|
Description: | Location- and scale-invariant Box-Cox and Yeo-Johnson power transformations allow for transforming variables with distributions distant from 0 to normality. Transformers are implemented as S4 objects. These allow for transforming new instances to normality after optimising fitting parameters on other data. A test for central normality allows for rejecting transformations that fail to produce a suitably normal distribution, independent of sample number. |
Authors: | Alex Zwanenburg [aut, cre] |
Maintainer: | Alex Zwanenburg <[email protected]> |
License: | EUPL |
Version: | 1.0.0 |
Built: | 2025-03-11 09:23:54 UTC |
Source: | https://github.com/oncoray/power.transform |
Not all data allows for a reasonable transformation to normality using power transformation. For example, uniformly distributed data or multi-modal data cannot be transformed to normality. This function computes a p-value for an empirical goodness of fit test for central normality. A distribution is centrally normal if the central 80% of the data are approximately normally distributed. The null-hypothesis is that the transformed distribution is centrally normal.
assess_transformation(x, transformer, verbose = TRUE, ...)
assess_transformation(x, transformer, verbose = TRUE, ...)
x |
A vector with numeric values that should be transformed to normality. |
transformer |
A transformer object created using
|
verbose |
Sets verbosity of the fubction. |
... |
Unused arguments. |
p-value for empirical goodness of fit test.
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox") assess_transformation( x = x, transformer = transformer)
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox") assess_transformation( x = x, transformer = transformer)
Creates skeleton objects. This generates objects without fitting parameters. This is primarily intended for creating transformers externally, where fitting parameters are known.
create_transformer_skeleton(method, lambda = 1, shift = 0, scale = 1)
create_transformer_skeleton(method, lambda = 1, shift = 0, scale = 1)
method |
Transformation method. Can be |
lambda |
Value of the transformation parameter lambda. Can also be
changed using the |
shift |
Value of the shift parameter. Can also be changed using the
|
scale |
Value of the scale parameter. Can also be changed using the
|
A transformer object
find_transformation_parameters
is used to find optimal parameters for
univariate transformation to normality.
find_transformation_parameters( x, method = "yeo_johnson", robust = TRUE, invariant = TRUE, lambda = c(-4, 6), empirical_gof_normality_p_value = NULL, ... )
find_transformation_parameters( x, method = "yeo_johnson", robust = TRUE, invariant = TRUE, lambda = c(-4, 6), empirical_gof_normality_p_value = NULL, ... )
x |
A vector with numeric values. |
method |
One of the following methods for power transformation:
|
robust |
Flag for using a robust version of Box-Cox or Yeo-Johnson transformation, as defined by Raymaekers and Rousseeuw (2021). This version is less sensitive in the presence outliers. |
invariant |
Flag for using a version of Box-Cox or Yeo-Johnson transformation that simultaneously optimises location and scale in addition to the lambda parameter. |
lambda |
Single lambda value, or range of lambda values that should be
considered. Default: c(4.0, 6.0). Can be |
empirical_gof_normality_p_value |
Significance value for the empirical
goodness-of-fit test for central normality. The p-value is computed through
the |
... |
Unused parameters. |
A transformer object that can be used to transform values.
Yeo, I. & Johnson, R. A. A new family of power transformations to improve normality or symmetry. Biometrika 87, 954–959 (2000).
Box, G. E. P. & Cox, D. R. An analysis of transformations. J. R. Stat. Soc. Series B Stat. Methodol. 26, 211–252 (1964).
Raymaekers, J., Rousseeuw, P. J. Transforming variables to central normality. Mach Learn. (2021).
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox")
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox")
Get the lambda value of a transformer object.
get_lambda(object, ...) ## S4 method for signature 'transformationPowerTransform' get_lambda(object, ...) ## S4 method for signature 'transformationBoxCox' get_lambda(object, ...) ## S4 method for signature 'transformationYeoJohnson' get_lambda(object, ...)
get_lambda(object, ...) ## S4 method for signature 'transformationPowerTransform' get_lambda(object, ...) ## S4 method for signature 'transformationBoxCox' get_lambda(object, ...) ## S4 method for signature 'transformationYeoJohnson' get_lambda(object, ...)
object |
Transformer object |
... |
Unused arguments |
Lambda value of the transformer.
Compute residuals of transformation to normality
get_residuals(x, transformer, ...)
get_residuals(x, transformer, ...)
x |
A vector with numeric values that should be transformed to normality. |
transformer |
A transformer object created using
|
... |
Unused arguments. |
A data.table
containing the expected (according to a normal
distribution) and observed z-scores, and their difference as residuals.
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox") residual_data <- get_residuals( x = x, transformer = transformer)
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox") residual_data <- get_residuals( x = x, transformer = transformer)
Get the scale value of a transformer object.
get_scale(object, ...) ## S4 method for signature 'transformationPowerTransform' get_scale(object, ...) ## S4 method for signature 'transformationBoxCox' get_scale(object, ...) ## S4 method for signature 'transformationYeoJohnson' get_scale(object, ...)
get_scale(object, ...) ## S4 method for signature 'transformationPowerTransform' get_scale(object, ...) ## S4 method for signature 'transformationBoxCox' get_scale(object, ...) ## S4 method for signature 'transformationYeoJohnson' get_scale(object, ...)
object |
Transformer object |
... |
Unused arguments |
scale value of the transformer.
Get the shift value of a transformer object.
get_shift(object, ...) ## S4 method for signature 'transformationPowerTransform' get_shift(object, ...) ## S4 method for signature 'transformationBoxCox' get_shift(object, ...) ## S4 method for signature 'transformationYeoJohnson' get_shift(object, ...)
get_shift(object, ...) ## S4 method for signature 'transformationPowerTransform' get_shift(object, ...) ## S4 method for signature 'transformationBoxCox' get_shift(object, ...) ## S4 method for signature 'transformationYeoJohnson' get_shift(object, ...)
object |
Transformer object |
... |
Unused arguments |
shift value of the transformer.
Get the transformation method of a transformer object.
get_transformation_method(object, ...) ## S4 method for signature 'transformationPowerTransform' get_transformation_method(object, ...)
get_transformation_method(object, ...) ## S4 method for signature 'transformationPowerTransform' get_transformation_method(object, ...)
object |
Transformer object |
... |
Unused arguments |
Transformation method
Iteratively computes M-estimates for location and scale. These are robust estimates of the mean and standard deviation of the data.
huber_estimate(x, k = 1.28, tol = 1e-04)
huber_estimate(x, k = 1.28, tol = 1e-04)
x |
Vector of numeric values for which the location and scale should be estimated. |
k |
Numeric value > 0 that the determines the value beyond which the signal is winsorized. |
tol |
Tolerance for the iterative procedure. |
list with location estimate "mu"
and scale estimate "sigma"
.
Create a figure that plots the expected, theoretical normal quantiles (z-scores) against the observed normal quantiles (z-scores) of the data.
plot_qq_plot( x, transformer, show_original = TRUE, show_identity = TRUE, use_alpha = TRUE, ggtheme = NULL )
plot_qq_plot( x, transformer, show_original = TRUE, show_identity = TRUE, use_alpha = TRUE, ggtheme = NULL )
x |
A vector with numeric values that should be transformed to normality. |
transformer |
A transformer object created using
|
show_original |
Show quantiles for original, untransformed, data in addition to transformed data. |
show_identity |
Show identity line that indicates equivalence between expected and observed quantiles. |
use_alpha |
Use transparency for points in case the data contains many instances. |
ggtheme |
|
A ggplot2
plot object for a Q-Q plot.
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox") if (rlang::is_installed("ggplot2")) { plot_qq_plot( x = x, transformer = transformer ) }
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox") if (rlang::is_installed("ggplot2")) { plot_qq_plot( x = x, transformer = transformer ) }
Create a figure that plots the residuals of the data. These residuals are the difference between expected normal quantiles and observed quantiles.
plot_residual_plot( x, transformer, centre_width = NULL, show_original = TRUE, use_alpha = TRUE, use_absolute_deviation = TRUE, ggtheme = NULL )
plot_residual_plot( x, transformer, centre_width = NULL, show_original = TRUE, use_alpha = TRUE, use_absolute_deviation = TRUE, ggtheme = NULL )
x |
A vector with numeric values that should be transformed to normality. |
transformer |
A transformer object created using
|
centre_width |
A numeric value between 0.0 and 1.0 that describes the width of the centre of the data. Can be NULL. |
show_original |
Show residuals for original, untransformed, data in addition to transformed data. |
use_alpha |
Use transparency for points in case the data contains many instances. |
use_absolute_deviation |
Plot absolute deviation instead of residuals. |
ggtheme |
|
A ggplot2
plot object for a Q-Q plot.
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox" ) if (rlang::is_installed("ggplot2")) { plot_residual_plot( x = x, transformer = transformer ) # Plot only central 80% of the data. plot_residual_plot( x = x, transformer = transformer, centre_width = 0.80, show_original = FALSE ) }
x <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x, method = "box_cox" ) if (rlang::is_installed("ggplot2")) { plot_residual_plot( x = x, transformer = transformer ) # Plot only central 80% of the data. plot_residual_plot( x = x, transformer = transformer, centre_width = 0.80, show_original = FALSE ) }
power_transform
transforms numeric values to normality.
power_transform(x, transformer = NULL, oob_action = "na", ...)
power_transform(x, transformer = NULL, oob_action = "na", ...)
x |
A vector with numeric values that should be transformed to normality. |
transformer |
A transformer object created using
|
oob_action |
Action that should be taken when out-of-bounds values are
encountered in
This argument has no effect for Yeo-Johnson transformations. |
... |
Arguments passed on to
|
A vector of transformed values of x
.
find_transformation_parameters
x <- exp(stats::rnorm(1000)) y <- power_transform( x = x, method = "box_cox")
x <- exp(stats::rnorm(1000)) y <- power_transform( x = x, method = "box_cox")
This package was originally based on, and contains code from, the familiar package (https://cran.r-project.org/package=familiar), under the EUPL license.
Maintainer: Alex Zwanenburg [email protected] (ORCID)
Authors:
Steffen Löck
Other contributors:
German Cancer Research Center (DKFZ) [copyright holder]
Useful links:
Report bugs at https://github.com/oncoray/power.transform/issues
Draws random values from an asymmetric generalised normal distribution.
ragn(n, location = 0, scale = 1, alpha = 0.5, beta = 2)
ragn(n, location = 0, scale = 1, alpha = 0.5, beta = 2)
n |
number of instances |
location |
central location of the distribution |
scale |
scale of the distribution. Must be strictly positive: |
alpha |
value between 0.0 and 1.0 that determines the skewness of the
distribution. |
beta |
Strictly positive value ( |
Random values drawn according to an asymmetric generalised normal distribution. Here the asymmetric generalised normal distribution is a symmetric general normal distribution, that is made asymmetric using the procedure described by Gijbels et al. To generate random values we use the quantile function of the symmetric generalised normal distribution that was derived by M. Griffin.
The default parameter values produce values as if drawn from the standard
normal distribution with , that is, the standard
deviation is not
instead of
.
One or more numeric values drawn from the asymmetric generalised normal distribution.
Gijbels I, Karim R, Verhasselt A. Quantile Estimation in a Generalized
Griffin M (2018). gnorm: Generalized Normal/Exponential Power Distribution.
# Draw values from a standard normal distribution. x <- power.transform::ragn(n = 10000, scale = 1/sqrt(2)) hist(x, 50) # Draw values from a left-skewed normal distribution. x <- power.transform::ragn(n = 10000, scale = 1/sqrt(2), alpha = 0.8) hist(x, 50) # Draw values from a right-skewed normal distribution. x <- power.transform::ragn(n = 10000, scale = 1/sqrt(2), alpha = 0.2) hist(x, 50) # Draw values from a standard laplace distribution. x <- power.transform::ragn(n = 10000, scale = 1/sqrt(2), beta = 1.0) hist(x, 50)
# Draw values from a standard normal distribution. x <- power.transform::ragn(n = 10000, scale = 1/sqrt(2)) hist(x, 50) # Draw values from a left-skewed normal distribution. x <- power.transform::ragn(n = 10000, scale = 1/sqrt(2), alpha = 0.8) hist(x, 50) # Draw values from a right-skewed normal distribution. x <- power.transform::ragn(n = 10000, scale = 1/sqrt(2), alpha = 0.2) hist(x, 50) # Draw values from a standard laplace distribution. x <- power.transform::ragn(n = 10000, scale = 1/sqrt(2), beta = 1.0) hist(x, 50)
revert_power_transform
reverts the transformation of numeric values to
normality.
revert_power_transform(y, transformer)
revert_power_transform(y, transformer)
y |
A vector with numeric values that was previously transformed to normality. |
transformer |
A transformer object created using
|
A vector of values.
x0 <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x0, method = "box_cox") y <- power_transform( x = x0, transformer = transformer) x1 <- revert_power_transform( y = y, transformer = transformer)
x0 <- exp(stats::rnorm(1000)) transformer <- find_transformation_parameters( x = x0, method = "box_cox") y <- power_transform( x = x0, transformer = transformer) x1 <- revert_power_transform( y = y, transformer = transformer)
Set the lambda value of a transformer object.
set_lambda(object, lambda, ...) ## S4 method for signature 'transformationPowerTransform' set_lambda(object, lambda, ...) ## S4 method for signature 'transformationBoxCox' set_lambda(object, lambda, ...) ## S4 method for signature 'transformationYeoJohnson' set_lambda(object, lambda, ...)
set_lambda(object, lambda, ...) ## S4 method for signature 'transformationPowerTransform' set_lambda(object, lambda, ...) ## S4 method for signature 'transformationBoxCox' set_lambda(object, lambda, ...) ## S4 method for signature 'transformationYeoJohnson' set_lambda(object, lambda, ...)
object |
Transformer object |
lambda |
Lambda value |
... |
Unused arguments |
Transformer object with updated lambda value.
Set the scale value of a transformer object.
set_scale(object, scale, ...) ## S4 method for signature 'transformationPowerTransform' set_scale(object, scale, ...) ## S4 method for signature 'transformationBoxCox' set_scale(object, scale, ...) ## S4 method for signature 'transformationYeoJohnson' set_scale(object, scale, ...)
set_scale(object, scale, ...) ## S4 method for signature 'transformationPowerTransform' set_scale(object, scale, ...) ## S4 method for signature 'transformationBoxCox' set_scale(object, scale, ...) ## S4 method for signature 'transformationYeoJohnson' set_scale(object, scale, ...)
object |
Transformer object |
scale |
scale value |
... |
Unused arguments |
Transformer object with updated scale value.
Set the shift value of a transformer object.
set_shift(object, shift, ...) ## S4 method for signature 'transformationPowerTransform' set_shift(object, shift, ...) ## S4 method for signature 'transformationBoxCox' set_shift(object, shift, ...) ## S4 method for signature 'transformationYeoJohnson' set_shift(object, shift, ...)
set_shift(object, shift, ...) ## S4 method for signature 'transformationPowerTransform' set_shift(object, shift, ...) ## S4 method for signature 'transformationBoxCox' set_shift(object, shift, ...) ## S4 method for signature 'transformationYeoJohnson' set_shift(object, shift, ...)
object |
Transformer object |
shift |
Shift value |
... |
Unused arguments |
Transformer object with updated shift value.
This class is used for Box-Cox transformations.
method
Main transformation method, i.e. "box_cox"
.
robust
Indicates whether a robust version of the Box-Cox transformation
is used to set transformation parameters. The value depends on the robust
argument of the find_transformation_parameters
function.
lambda
Numeric lambda parameter for the Box-Cox transformation.
shift
Numeric shift parameter for the Box-Cox transformation. The value
depends on the data used for setting transformation parameters. If all data
are strictly positive, shift
has a value of 0.0
. When negative or zero
values are present, data are shifted to be strictly positive. If
invariant=TRUE
in the find_transformation_parameters
function,
lambda
, shift
and scale
parameters are optimised simultaneously.
scale
Numeric scale parameter for the Box-Cox transformation. If
invariant=TRUE
in the find_transformation_parameters
function,
lambda
, shift
and scale
parameters are optimised simultaneously.
Otherwise, the scale
parameter has a value of 1.0
.
complete
Indicates whether transformation parameters were set.
find_transformation_parameters
This class is for transformers that do not alter the data.
method
Main transformation method, i.e. "none"
.
complete
Indicates whether transformation parameters were set.
This is the superclass for transformation objects.
method
Main transformation method.
complete
Indicates whether transformation parameters were set.
version
Version of the power.transform package that was used to create the transformation objecst.
This class is used for Yeo-Johnson transformations.
method
Main transformation method, i.e. "yeo_johnson"
.
robust
Indicates whether a robust version of the Yeo-Johnson
transformation is used to set transformation parameters. The value depends
on the robust
argument of the find_transformation_parameters
function.
lambda
Numeric lambda parameter for the Yeo-Johnson transformation.
shift
Numeric shift parameter for the Yeo-Johnson transformation. If
invariant=TRUE
in the find_transformation_parameters
function,
lambda
, shift
and scale
parameters are optimised simultaneously.
Otherwise, the shift
parameter has a value of 0.0
.
scale
Numeric scale parameter for the Yeo-Johnson transformation. If
invariant=TRUE
in the find_transformation_parameters
function,
lambda
, shift
and scale
parameters are optimised simultaneously.
Otherwise, the scale
parameter has a value of 1.0
.
complete
Indicates whether transformation parameters were set.
find_transformation_parameters