Title: | Simulation and Comparison of Dissolution Profiles |
---|---|
Description: | Compare dissolution profiles with confidence interval of similarity factor f2 using bootstrap methodology as described in the literature, such as Efron and Tibshirani (1993, ISBN:9780412042317), Davison and Hinkley (1997, ISBN:9780521573917), and Shah et al. (1998) <doi:10.1023/A:1011976615750>. The package can also be used to simulate dissolution profiles based on mathematical modelling and multivariate normal distribution. |
Authors: | Zhengguo Xu [aut, cre] |
Maintainer: | Zhengguo Xu <[email protected]> |
License: | GPL (>=3) |
Version: | 0.4.2 |
Built: | 2024-11-09 05:24:03 UTC |
Source: | https://github.com/zhengguoxu/bootf2 |
with Bootstrap MethodologyMain function to estimate 90% confidence intervals of using
bootstrap methodology.
bootf2(test, ref, path.in, file.in, path.out, file.out, n.boots = 10000L, seed = 306L, digits = 2L, alpha = 0.05, regulation = c("EMA", "FDA", "WHO","Canada", "ANVISA"), min.points = 1L, both.TR.85 = FALSE, print.report = TRUE, report.style = c("concise", "intermediate", "detailed"), f2.type = c("all", "est.f2", "exp.f2", "bc.f2", "vc.exp.f2", "vc.bc.f2"), ci.type = c("all", "normal", "basic", "percentile", "bca.jackknife", "bca.boot"), quantile.type = c("all", as.character(1:9), "boot"), jackknife.type = c("all", "nt+nr", "nt*nr", "nt=nr"), time.unit = c("min", "h"), output.to.screen = FALSE, sim.data.out = FALSE)
bootf2(test, ref, path.in, file.in, path.out, file.out, n.boots = 10000L, seed = 306L, digits = 2L, alpha = 0.05, regulation = c("EMA", "FDA", "WHO","Canada", "ANVISA"), min.points = 1L, both.TR.85 = FALSE, print.report = TRUE, report.style = c("concise", "intermediate", "detailed"), f2.type = c("all", "est.f2", "exp.f2", "bc.f2", "vc.exp.f2", "vc.bc.f2"), ci.type = c("all", "normal", "basic", "percentile", "bca.jackknife", "bca.boot"), quantile.type = c("all", as.character(1:9), "boot"), jackknife.type = c("all", "nt+nr", "nt*nr", "nt=nr"), time.unit = c("min", "h"), output.to.screen = FALSE, sim.data.out = FALSE)
test , ref
|
Data frames of dissolution profiles of test and reference
product if |
path.in , file.in , path.out , file.out
|
Character strings of input and output directories and file names. See Input/Output in Details. |
n.boots |
An integer indicating the number of bootstrap samples. |
seed |
Integer seed value for reproducibility. If missing, a random seed will be generated for reproducibility purpose. |
digits |
An integer indicating the decimal points for the output. |
alpha |
A numeric value between 0 and 1 to estimate
|
regulation |
Character strings indicating regulatory guidelines.
@seealso |
min.points |
An integer indicating the minimum time points to be used
to calculate |
both.TR.85 |
Logical. If |
print.report |
Logical. If |
report.style |
|
f2.type |
Character strings indicating which type of |
ci.type |
Character strings indicating which type of confidence interval should be estimated. See Types of confidence intervals in Details. |
quantile.type |
Character strings indicating the type of percentile. |
jackknife.type |
Character strings indicating the type of jackknife method. See Details. |
time.unit |
Character strings indicating the unit of time. It should
be either |
output.to.screen |
Logical. If |
sim.data.out |
Logical. If |
Arguments test
and ref
must be provided by the user. They should be R
data frames
, with time as the first column, and all individual profiles
profiles as the rest columns. The actual names of the columns do not matter
since they will be renamed internally.
The dissolution data of test and reference product can either be provided as
data frames for test
and ref
, as explained above, or be read from an
Excel file with data of test and reference stored in separate worksheets.
In the latter case, the argument path.in
, the directory where the Excel
file is, and file.in
, the name of the Excel file including the file
extension .xls
or .xlsx
, must be provided. In such case, the argument
test
and ref
must be the names of the worksheets in quotation marks.
The first column of each Excel worksheet must be time, and the rest columns
are individual dissolution profiles. The first row should be column names,
such as time, unit01, unit02, ... The actual names of the columns do not
matter as they will be renamed internally.
Arguments path.out
and file.out
are the names of the output directory
and file. If they are not provided, but argument print.report
is TRUE
,
then a plain text report will be generated automatically in the current
working directory with file name test_vs_ref_TZ_YYYY-MM-DD_HHMMSS.txt
,
where test
and ref
are data set names of test and reference, TZ
is the
time zone such as CEST
, YYYY-MM-DD
is the numeric date format and
HHMMSS
is the numeric time format for hour, minute, and second.
For a quick check, set argument output.to.screen = TRUE
, a summary report
very similar to concise
style report will be printed on screen.
According to Shah et al, the population for the inference is
where is the number of time points;
and
are population mean of test and
reference product at time point
, respectively;
is the summation from
to
.
Five estimators for are included in the function:
The estimated , denoted by
, is the
one written in various regulatory guidelines. It is expressed differently,
but mathematically equivalently, as
where is the number of time points;
and
are mean dissolution data at the
th time point of random samples chosen from the test and the
reference population, respectively. Compared to the equation of population
above, the only difference is that in the equation of
the sample means of dissolution profiles replace
the population means for the approximation. In other words, a point
estimate is used for the statistical inference in practice.
The Bias-corrected , denoted by
, was described in the article of
Shah et al, as
where and
are unbiased estimates of variance at
the
th time points of random samples chosen from test and reference
population, respectively; and
is the sample size.
The variance- and bias-corrected , denoted by
, does not assume equal weight of
variance as
does.
where and
are
weighting factors for variance of test and reference products,
respectively, which can be calculated as follows:
and
The expected , denoted by
, is calculated based on the mathematical expectation of estimated
,
using mean dissolution profiles and variance from samples for the approximation of population values.
The variance-corrected expected , denoted by
, is calculated as
The following confidence intervals are included:
The Normal interval with bias correction, denoted by normal
in the
function, is estimated according to Davison and Hinkley,
where are the lower and upper
limit of the confidence interval estimated from bootstrap samples;
denotes the estimators described
above;
represents the inverse of standard
normal cumulative distribution function with type I error
;
and
are the resampling estimates of bias
and variance calculated as
and
where is the number of bootstrap samples;
is the
estimate with
th bootstrap sample, and
is the
mean value.
The basic interval, denoted by basic
in the function, is estimated
according to Davison and Hinkley,
and
where and
are the
th and
th ordered resampling
estimates of
, respectively. When
is not
an integer, the following equation is used for interpolation,
where is the integer part of
,
and
are the
th and the
th ordered resampling
estimates of
, respectively.
The percentile intervals, denoted by percentile
in the function, are
estimated using nine different types of quantiles, Type 1 to Type 9, as
summarized in Hyndman and Fan's article and implemented in R
's
quantile
function. Using R
's boot
package, program bootf2BCA
outputs a percentile interval using the equation above for interpolation.
To be able to compare the results among different programs, the same
interval, denoted by Percentile (boot)
in the function, is also
included in the function.
The bias-corrected and accelerated (BCa) intervals are estimated according to Efron and Tibshirani,
where and
are the
th and the
th percentile of the
resampling estimates of
, respectively. Type I errors
and
are obtained as
and
where and
are called
bias-correction and acceleration, respectively.
There are different methods to estimate and
. Shah et al. used jackknife technique, denoted by
bca.jackknife
in the function,
and
where denotes the number of
element in the set,
is the
th
jackknife statistic,
is the mean of
the jackknife statistics, and
is the summation from 1 to
sample size
.
Program bootf2BCA
gives a slightly different BCa interval with R
's
boot
package. This approach, denoted by bca.boot
in the function, is
also implemented in the function for estimating the interval.
jackknife.type
For any sample with size , the jackknife estimator is obtained by
estimating the parameter for each subsample omitting the
th
observation. However, when two samples (e.g., test and reference) are
involved, there are several possible ways to do it. Assuming sample size
of test and reference are
and
,
the following three possibility are considered:
Estimated by removing one observation from both test and reference samples.
In this case, the prerequisite is ,
denoted by
nt=nr
in the function. So if there are 12 units in test and
reference data sets, there will be 12 jackknife estimators.
Estimate the jackknife for test sample while keeping the reference data
unchanged; and then estimate jackknife for reference sample while keeping
the test sample unchanged. This is denoted by nt+nr
in the function.
This is the default method. So if there are 12 units in test and reference
data sets, there will be jackknife estimators.
For each observation deleted from test sample, estimate jackknife for
reference sample. This is denoted by nt*nr
in the function. So if there
are 12 units in test and reference data sets, there will be jackknife estimators.
A list of 3 or 5 components.
boot.ci
: A data frame of bootstrap confidence intervals.
boot.f2
: A data frame of all individual values for all
bootstrap data set.
boot.info
: A data frame with detailed information of bootstrap for
reproducibility purpose, such as all arguments used in the function, time
points used for calculation of , and the number of
NA
s.
boot.summary
: A data frame with descriptive statistics of the
bootstrap .
boot.t
and boot.r
: Lists of individual bootstrap samples for test
and reference product if sim.data.out = TRUE
.
Shah, V. P.; Tsong, Y.; Sathe, P.; Liu, J.-P. In Vitro
Dissolution Profile Comparison—Statistics and Analysis of the
Similarity Factor, . Pharmaceutical Research 1998,
15 (6), 889–896. DOI: 10.1023/A:1011976615750.
Davison, A. C.; Hinkley, D. V. Bootstrap Methods and Their Application. Cambridge University Press, 1997.
Hyndman, R. J.; Fan, Y. Sample Quantiles in Statistical Packages. The American Statistician 1996, 50 (4), 361–365. DOI: /10.1080/00031305.1996.10473566.
Efron, B.; Tibshirani, R. An Introduction to the Bootstrap. Chapman & Hall, 1993.
# time points tp <- c(5, 10, 15, 20, 30, 45, 60) # model.par for reference with low variability par.r <- list(fmax = 100, fmax.cv = 3, mdt = 15, mdt.cv = 14, tlag = 0, tlag.cv = 0, beta = 1.5, beta.cv = 8) # simulate reference data dr <- sim.dp(tp, model.par = par.r, seed = 100, plot = FALSE) # model.par for test par.t <- list(fmax = 100, fmax.cv = 3, mdt = 12.29, mdt.cv = 12, tlag = 0, tlag.cv = 0, beta = 1.727, beta.cv = 9) # simulate test data with low variability dt <- sim.dp(tp, model.par = par.t, seed = 100, plot = FALSE) # bootstrap. to reduce test run time, n.boots of 100 was used in the example. # In practice, it is recommended to use n.boots of 5000--10000. # Set `output.to.screen = TRUE` to view the result on screen d <- bootf2(dt$sim.disso, dr$sim.disso, n.boots = 100, print.report = FALSE)
# time points tp <- c(5, 10, 15, 20, 30, 45, 60) # model.par for reference with low variability par.r <- list(fmax = 100, fmax.cv = 3, mdt = 15, mdt.cv = 14, tlag = 0, tlag.cv = 0, beta = 1.5, beta.cv = 8) # simulate reference data dr <- sim.dp(tp, model.par = par.r, seed = 100, plot = FALSE) # model.par for test par.t <- list(fmax = 100, fmax.cv = 3, mdt = 12.29, mdt.cv = 12, tlag = 0, tlag.cv = 0, beta = 1.727, beta.cv = 9) # simulate test data with low variability dt <- sim.dp(tp, model.par = par.t, seed = 100, plot = FALSE) # bootstrap. to reduce test run time, n.boots of 100 was used in the example. # In practice, it is recommended to use n.boots of 5000--10000. # Set `output.to.screen = TRUE` to view the result on screen d <- bootf2(dt$sim.disso, dr$sim.disso, n.boots = 100, print.report = FALSE)
Main function to calculate according to different regulatory
guidelines.
calcf2(test, ref, path.in, file.in, path.out, file.out, regulation = c("EMA", "FDA", "WHO", "Canada", "ANVISA"), cv.rule = TRUE, message = FALSE, min.points = 3L, f2.type = c("est.f2", "exp.f2", "bc.f2", "vc.exp.f2", "vc.bc.f2", "all"), both.TR.85 = FALSE, digits = 2L, time.unit = c("min", "h"), plot = TRUE, plot.start.time = 0, plot.max.unit = 24L)
calcf2(test, ref, path.in, file.in, path.out, file.out, regulation = c("EMA", "FDA", "WHO", "Canada", "ANVISA"), cv.rule = TRUE, message = FALSE, min.points = 3L, f2.type = c("est.f2", "exp.f2", "bc.f2", "vc.exp.f2", "vc.bc.f2", "all"), both.TR.85 = FALSE, digits = 2L, time.unit = c("min", "h"), plot = TRUE, plot.start.time = 0, plot.max.unit = 24L)
test , ref
|
Data frames of dissolution profiles of test and reference
product if |
path.in , file.in , path.out , file.out
|
Character strings of input and output directories and file names. See Input/Output in Details. |
regulation |
Character strings indicating regulatory guidelines. See Regulation in Details. |
cv.rule |
Logical. If |
message |
Logical. If |
min.points |
An integer indicating the minimum time points to be used
to calculate |
f2.type |
Character strings indicating which |
both.TR.85 |
Logical. If |
digits |
An integer indicating the decimal points for the output. |
time.unit |
Character strings indicating the unit of time. It should
be either |
plot |
Logical. If |
plot.start.time |
Numeric value indicating the starting time for the plot. |
plot.max.unit |
Integer. If the number of individual units is no more
than this value, the mean and all individual profiles will be plotted;
otherwise, individual profiles will be represented by boxplots at each
time point. Therefore, to avoid overplotting, this value should not be
too large. @seealso |
Arguments test
and ref
must be provided by the user. They should be R
data frames
, with time as the first column, and all individual profiles
profiles as the rest columns, or mean profile as the second column if only
mean profile is available. The actual names of the columns do not matter
since they will be renamed internally.
The dissolution data of test and reference product can either be provided as
data frames for test
and ref
, as explained above, or be read from an
Excel file with data of test and reference stored in separate worksheets.
In the latter case, the argument path.in
, the directory where the Excel
file is, and file.in
, the name of the Excel file including the file
extension .xls
or .xlsx
, must be provided. In such case, the argument
test
and ref
must be the names of the worksheets in quotation marks.
The first column of each Excel worksheet must be time, and the rest columns
are individual dissolution profiles, or the second column must be mean
profile if only mean data is available. The first row should be column names,
such as time, unit01, unit02, ... The actual names of the columns do not
matter as they will be renamed internally.
Arguments path.out
and file.out
are the names of the output directory
and file. It is an overkill to output such simple calculations; therefore,
unless these two arguments are specified by the user, results are printed
on screen by default.
To apply method, different regulatory guidelines have slightly
different requirements. Some requirements are almost universal, such as same
time points for the test and reference product, minimum 3 time points
(excluding time zero), and twelve individual profiles for each formulation.
Other requirements are slightly different among different regulatory
guidelines, or at least interpreted differently. Two main issues are the
rules for the variability (CV Rule) and time points where dissolution is more
than 85% (85% Rule).
EMA
, Canada
, and ANVISA
: The CV of the first time point should not
be greater than 20%, and the CV of the rest time points should not be
greater than 10%.
WHO
: The CV should not be greater than 20% for time points up to
10 min, and not greater than 10% for the rest time points.
FDA
: US FDA is more flexible. The CV for the early time points should
not be greater than 20%, and for the rest time points, not greater than
10%.
The phrase the first time point in EMA
rule was later interpreted as all
time points up to 10 min, according to an unofficial communication with an
European regulator. This makes the EMA
rule the same as WHO
rule. For
example, if there are 5 min and 10 min time points in the dissolution
profiles, the CV for both 5 min and 10 min should not be greater than 20%.
The first time point in ANVISA
rule corresponds to 40% of the total
collected points. For example, for a dissolution profile with five
collection times, the first two collection times are considered first points.
The phrase early time points in FDA
rule is typically interpreted as
those points up to 15 min, sometimes even up to 20 min according to
an unofficial communication with FDA staff. In the function calcf2()
, the
cutting point for FDA rule is 15 min.
This rule is implemented as follows:
EMA
, FDA
, Canada
, and ANVISA
: Only one measurement is considered
after 85% of dissolution for any product.
WHO
: Dissolution profiles should be 'cut' at the time point where
the reference release more than 85%. Therefore, WHO
rule only differs
from rule of EMA
, FDA
, Canada
, and ANVISA
when test product
dissolve faster than reference. If reference product dissolve faster, then
rules of all five regulatory bodies are same in this regard.
The exact phrase in the guidance of US FDA regarding this rule is that
"Only one measurement should be considered after 85% dissolution of both
the products." Due to the ambiguous word "both" used in the sentence, the
conventional interpretation was that all measurements up to the time point
at which both test and reference dissolved more than 85% should be included
in the calculation of . However, this is only true when both
test and reference dissolve more than 85% at the same time points.
Consider the following example:
time | test | reference |
5 | 7 | 10 |
10 | 15 | 20 |
15 | 50 | 55 |
20 | 69 | 86 |
30 | 82 | 90 |
45 | 84 | 95 |
60 | 86 | 97 |
According to conventional interpretation, all measurements up to 60 min
should be included to calculate because both test and reference
dissolved more than 85% only at 60 min, not at any earlier time point.
However, in such case, there would be 4 measurement of reference (20, 30, 45,
and 60 min) included in the calculation, which would be a direct
contradictory to the phrase "Only one measurement should be considered
after 85% ..." in the same statement in the guidance!
In an unofficial communication using this example, an FDA staff confirmed that only the first 4 time points (up to 20 min) would be used. In other words, FDA rule in this regard is the same as EMA rule.
The statement in ANVISA
guideline also uses the word "ambos" (means both),
which could also lead to the similar confusion. Follow the same logic as
demonstrated above, it should also be interpreted as the same rule in EMA
guideline.
Read vignette Introduction to bootf2 for more details.
A data frame of type and
value, the
number of time points used for the calculation (
f2.tp
), indication if
both test and reference dissolve more than 85% at 15 min (d85at15
), and
other information used for the calculation.
tp <- c(5, 10, 15, 20, 30, 45, 60) mod.par.t <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0, mdt = 20, mdt.cv = 5, beta = 2.2, beta.cv = 5) d.t <- sim.dp(tp, model.par = mod.par.t, seed = 100, n.units = 120L, plot = FALSE)$sim.disso mod.par.r <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0, mdt = 25, mdt.cv = 4, beta = 2.1, beta.cv = 3) d.r <- sim.dp(tp, model.par = mod.par.r, seed = 100, n.units = 120L, plot = FALSE)$sim.disso # set `message = TRUE` to view the compliance of the regulatory guidelines. calcf2(d.t, d.r, plot = FALSE)
tp <- c(5, 10, 15, 20, 30, 45, 60) mod.par.t <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0, mdt = 20, mdt.cv = 5, beta = 2.2, beta.cv = 5) d.t <- sim.dp(tp, model.par = mod.par.t, seed = 100, n.units = 120L, plot = FALSE)$sim.disso mod.par.r <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0, mdt = 25, mdt.cv = 4, beta = 2.1, beta.cv = 3) d.r <- sim.dp(tp, model.par = mod.par.r, seed = 100, n.units = 120L, plot = FALSE)$sim.disso # set `message = TRUE` to view the compliance of the regulatory guidelines. calcf2(d.t, d.r, plot = FALSE)
Helper Functions
bpwhisker.l(x) bpwhisker.u(x) ci.header(boot.info) mod.ref(tp, ref.dp, digits = 4, model, max.disso, time.unit) rpt.ci(f2type, btsum, boot.info) rpt.f2(f2type, f2o, boot.info, a.jack, btsum) rpt.concise(boot.f2.ci, boot.info, f2o, a.jack, btsum) rpt.detailed(data.t, data.r, boot.t, boot.r, boot.f2, boot.info, f2o) rpt.info(boot.info) rpt.intermediate(boot.info, boot.f2) rpt.screen(boot.f2.ci, boot.info, f2o, a.jack, btsum)
bpwhisker.l(x) bpwhisker.u(x) ci.header(boot.info) mod.ref(tp, ref.dp, digits = 4, model, max.disso, time.unit) rpt.ci(f2type, btsum, boot.info) rpt.f2(f2type, f2o, boot.info, a.jack, btsum) rpt.concise(boot.f2.ci, boot.info, f2o, a.jack, btsum) rpt.detailed(data.t, data.r, boot.t, boot.r, boot.f2, boot.info, f2o) rpt.info(boot.info) rpt.intermediate(boot.info, boot.f2) rpt.screen(boot.f2.ci, boot.info, f2o, a.jack, btsum)
x |
Numeric vector |
boot.info |
A data frame of bootstrap information from |
tp , ref.dp
|
Numeric vector of time points |
digits |
An integer indicating the decimal points for the output. |
model |
Strings of model names. Currently only 'Weibull' and 'first-order' models are supported. |
max.disso |
Numeric value indicating the maximum dissolution. |
time.unit |
Character strings indicating the unit of time. It should
be either |
f2type |
Character strings indicating the f2 type. |
btsum |
A data frame of descriptive statistics of the bootstrap data set. |
f2o |
Vector of f2 values calculated with the original data set |
a.jack |
Data frame of acceleration from |
boot.f2.ci |
Matrix of f2 values from bootstrap data sets |
data.t , data.r
|
Input data sets for test and reference. |
boot.t , boot.r
|
List of bootstrap data sets for test and reference. |
boot.f2 |
Matrix of f2 calculated from bootstrap data sets. |
Dissolution data of reference and 5 batches of test products published in the article of Shah et al, Pharm. Res.1998, 15(6), 889–896.
shah1998
shah1998
A list of 6 data frames. Each data frame is a set of dissolution profiles with the format: the first column is time, the rest columns are individual profiles.
dissolution of reference batch
dissolution of test batch 1
dissolution of test batch 2
dissolution of test batch 3
dissolution of test batch 4
dissolution of test batch 5
Shah VP, Tsong Y, Sathe P, Liu JP. In vitro dissolution profile comparison–statistics and analysis of the similarity factor, f2. Pharm Res. 1998 Jun;15(6):889-96. doi: 10.1023/a:1011976615750.
Function to simulate dissolution profiles based on mathematical models or multivariate normal distribution.
sim.dp(tp, dp, dp.cv, model = c("Weibull", "first-order"), model.par = NULL, seed = NULL, n.units = 12L, product, max.disso = 100, ascending = FALSE, message = FALSE, time.unit = c("min", "h"), plot = TRUE, plot.max.unit = 36L, empirical = TRUE, cv.tol = 1e-6)
sim.dp(tp, dp, dp.cv, model = c("Weibull", "first-order"), model.par = NULL, seed = NULL, n.units = 12L, product, max.disso = 100, ascending = FALSE, message = FALSE, time.unit = c("min", "h"), plot = TRUE, plot.max.unit = 36L, empirical = TRUE, cv.tol = 1e-6)
tp |
Numeric vector of time points for the dissolution profiles. See Details. |
dp , dp.cv
|
Numeric vectors of the target mean dissolution profile
( |
model |
Character strings of model names. Currently only |
model.par |
A list with model parameters. If missing, a list of
random |
seed |
Integer seed value for reproducibility. If missing, a random seed will be generated for reproducibility purpose. |
n.units |
An integer indicating the number of individual profiles to be simulated. |
product |
Character strings indicating the product name of the simulated profiles. If missing, a random name with 3 letters and 4 digits will be generated. |
max.disso |
Numeric value for the maximum possible dissolution. In theory, the maximum dissolution is 100%, but in practice, it is not uncommon to see higher values, such as 105%, or much lower values, especially for products with low solubility. |
ascending |
Logical. If |
message |
Logical. If |
time.unit |
Character strings indicating the unit of time. It should
be either |
plot |
Logical. If |
plot.max.unit |
Integer. If the number of individual units is no more
than this value, the mean and all individual profiles will be plotted;
otherwise, individual profiles will be represented by boxplots at each
time point. Therefore, to avoid overplotting, this value should not be
too large. @seealso |
empirical |
Logical. If |
cv.tol |
Numeric value for CV tolerance. |
The approach used to simulate individual dissolution profiles depends on if
the target mean dissolution profile dp
is provided by the user or not.
If dp
is not provided, then it will be calculated using tp
, model
,
and model.par
. The parameters defined by model.par
are considered as
the population parameter; consequently, the calculated dp
is
considered as the targeted population profile. In addition, n.units
sets of individual model parameters will be simulated based on
exponential error model, and individual dissolution profiles will be
generated using those individual parameters. The mean of all simulated
individual profiles, sim.mean
, included in one of the outputs data
frames, sim.summary
, will be similar, but not identical, to dp
.
The difference between sim.mean
and dp
is determined by the number of
units and the variability of the model parameters. In general, the larger
the number of units and the lower of the variability, the smaller the
difference. Additional details on the mathematical models are given below.
If dp
is provided, then n.units
of individual dissolution profiles
will be simulated using multivariate normal distribution. The mean of all
simulated individual profiles, sim.mean
, will be identical to dp
.
In such case, it is recommended that dp.cv
, the CV at time points tp
,
should also be provided by the user. If dp.cv
is not provided, it will
be generated automatically such that the CV is between 10% and 20% for time
points up to 10 min, between 1% and 3% for time points where the
dissolution is more than 95%, between 3% and 5% for time points where the
dissolution is between 90% and 95%, and between 5% and 10% for the rest of
time points. Whether the dp.cv
is provided or generated automatically,
the CV calculated using all individual profiles will be equal to dp.cv
.
Additional details on this approach are given below.
If dp
is provided by the user, logically, tp
must also be provided, and
the approach based on multivariate normal distribution is used, as explained
above. If dp
is not provided, tp
could be missing, i.e., a simple
function call such as sim.dp()
will simulate dissolution profiles. In such
case, a default tp
will be generated depending on the time.unit
: if the
time.unit
is "min"
, then tp
would be c(5, 10, 15, 20, 30, 45, 60)
;
otherwise the tp
would be c(1, 2, 3, 4, 6, 8, 10, 12)
. The rest
arguments are either optional, or required by the function but default
values will be used.
The first-order model is expressed as
and the Weibull model was expressed either as
or
where is the maximum dissolution,
is the mean dissolution time,
is the lag time,
and
are the scale and shape factor in Weibull function,
and
is the rate constant in the first-order model. Obviously,
The two Weibull models are mathematically equivalent by letting
.
Individual model parameter were simulated according to the exponential error model
where and
denote the individual and
population model parameters;
represents the normal distribution with mean zero and variance
(
).
model.par
The argument model.par
should be supplied as a named list of model
parameters as explained above, and their respective CV for simulation of
individual parameters. Therefore, for the first-order model, three pairs of
parameters should be specified: fmax/fmax.cv
, k/k.cv
, and tlag/tlag.cv
;
and for Weibull model, four pairs: fmax/fmax.cv
, tlag/tlag.cv
,
beta/beta.cv
, and either alpha/alpha.cv
or mdt/mdt.cv
, depending on
the mathematical formula used. CV should be given in percentage, e.g., if
CV for beta
is 30%, it should be specified as beta.cv = 30
, not
beta.cv = 0.3
. The order of the parameters does not matter, but the name
of the parameters must be given exactly same as described above.
For example:
model.par = list(fmax = 100, fmax.cv = 5, k = 0.6, k.cv = 25, tlag = 0, tlag.cv = 0)
for the first-order model.
model.par = list(fmax = 100, fmax.cv = 5, tlag = 5, tlag.cv = 10, mdt = 15, mdt.cv = 20, beta = 1.5, beta.cv = 5)
, or
model.par = list(fmax = 100, fmax.cv = 5, tlag = 5, tlag.cv = 10, alpha = 60, alpha.cv = 20, beta = 1.5, beta.cv = 5)
for Weibull models.
When this approach is used, depending on dp/dp.cv
, there is no guarantee
that all individual profiles increase with time; near the end of the time
points, some individual profiles may decrease, especially when the
dissolution is close to the plateau and the variability is high. This can
happen to real life data, especially for those products with drug substances
that are unstable in dissolution medium. To force increase for all profiles,
set ascending = TRUE
. Depending on the data, the program may take long
time to run, or may even fail.
A list of 3 to 5 components:
sim.summary
: A data frame with summary statistics of all
individual dissolution profiles.
sim.disso
: A data frame with all individual dissolution profiles.
sim.info
: A data frame with information of the simulation such as
the seed number and number of individual profiles. If modelling approach
is used, the data frame will contain model parameters as well.
model.par.ind
: A data frame of all individual model parameters
that were used for the simulation of individual dissolution profiles.
Available only if the modelling approach is used, i.e., when dp
is missing.
sim.dp
: A plot. Available only if the argument plot
is TRUE
.
# time points tp <- c(5, 10, 15, 20, 30, 45, 60) # using all default values to simulate profiles d1 <- sim.dp(tp, plot = FALSE) # summary statistics d1$sim.summary # individual profiles d1$sim.disso # simulation information d1$sim.info #individual model parameters d1$mod.par.ind # using Weibull model to simulate 100 profiles without lag time mod.par <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0, mdt = 20, mdt.cv = 5, beta = 2.2, beta.cv = 5) d2 <- sim.dp(tp, model.par = mod.par, seed = 100, n.units = 100L, plot = FALSE) # using multivariate normal distribution approach # target mean profile with same length as tp dp <- c(39, 56, 67, 74, 83, 90, 94) # CV% at each time point dp.cv <- c(19, 15, 10, 8, 8, 5, 3) # simulation d3 <- sim.dp(tp, dp = dp, dp.cv = dp.cv, seed = 1234, plot = FALSE)
# time points tp <- c(5, 10, 15, 20, 30, 45, 60) # using all default values to simulate profiles d1 <- sim.dp(tp, plot = FALSE) # summary statistics d1$sim.summary # individual profiles d1$sim.disso # simulation information d1$sim.info #individual model parameters d1$mod.par.ind # using Weibull model to simulate 100 profiles without lag time mod.par <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0, mdt = 20, mdt.cv = 5, beta = 2.2, beta.cv = 5) d2 <- sim.dp(tp, model.par = mod.par, seed = 100, n.units = 100L, plot = FALSE) # using multivariate normal distribution approach # target mean profile with same length as tp dp <- c(39, 56, 67, 74, 83, 90, 94) # CV% at each time point dp.cv <- c(19, 15, 10, 8, 8, 5, 3) # simulation d3 <- sim.dp(tp, dp = dp, dp.cv = dp.cv, seed = 1234, plot = FALSE)
ValuesGiven any mean dissolution profile dp
, this function will simulate a mean
dissolution profile such that when the newly simulated profile is compared
to dp
, the calculated will be equal to the predefined target
value.
sim.dp.byf2(tp, dp, target.f2, seed = NULL, min.points = 3L, regulation = c("EMA", "FDA", "WHO", "Canada", "ANVISA"), model = c("Weibull", "first-order"), digits = 2L, max.disso = 100, message = FALSE, both.TR.85 = FALSE, time.unit = c("min", "h"), plot = TRUE, sim.dp.out, sim.target = c("ref.pop", "ref.median", "ref.mean"), model.par.cv = 50, fix.fmax.cv = 0, random.factor = 3)
sim.dp.byf2(tp, dp, target.f2, seed = NULL, min.points = 3L, regulation = c("EMA", "FDA", "WHO", "Canada", "ANVISA"), model = c("Weibull", "first-order"), digits = 2L, max.disso = 100, message = FALSE, both.TR.85 = FALSE, time.unit = c("min", "h"), plot = TRUE, sim.dp.out, sim.target = c("ref.pop", "ref.median", "ref.mean"), model.par.cv = 50, fix.fmax.cv = 0, random.factor = 3)
tp , dp
|
Numeric vector of time points |
target.f2 |
Numeric value of target |
seed |
Integer seed value for reproducibility. If missing, a random seed will be generated for reproducibility purpose. |
min.points |
An integer indicating the minimum time points to be used
to calculate |
regulation |
Character strings indicating regulatory guidelines.
@seealso |
model |
Character strings of model names. Currently only |
digits |
An integer indicating the decimal points for the output. |
max.disso |
Numeric value for the maximum possible dissolution. In theory, the maximum dissolution is 100%, but in practice, it is not uncommon to see higher values, such as 105%, or much lower values, especially for products with low solubility. |
message |
Logical. If |
both.TR.85 |
Logical. If |
time.unit |
Character strings indicating the unit of time. It should
be either |
plot |
Logical. If |
sim.dp.out |
Output of function |
sim.target |
Character strings indicating to which target profile
should the newly simulated profile be compared for the calculation of
|
model.par.cv , fix.fmax.cv
|
Numeric values expressed as percentages
used for random generation of model parameters and fmax when optimization
algorithm is not used, i.e., when |
random.factor |
Numeric value used for random generation of model
parameters when optimization algorithm is used, i.e., when |
The main principle of the function is as follows:
For any given mean dissolution profile dp
, fit a suitable mathematical
model and obtain model parameters.
No precise fitting is required since those parameters will be served as initial value for further model fitting.
If sim.dp.out
, the output of the function sim.dp()
, is available,
no initial fitting is necessary as model parameters can be read directly
from the output, unless multivariate normal distribution approach was
used in sim.dp()
. In such case, initial model fitting will be done.
Find a suitable model parameters and simulate a new dissolution profile,
comparing the new profile to the provided reference profile dp
by
calculating . If the the obtained
is
equal to, or within the lower and upper limit of, the
target.f2
,
then the function will output the obtained model parameters and the new
profile.
There are two approaches used to find the suitable model parameters:
If target.f2
is a single value, optimization algorithm will be used and
the newly simulated dissolution profile will have equal to
target.f2
when compared to dp
(within numeric precision defined by the
tolerance).
If target.f2
is a vector of two numbers representing the lower and upper
limit of target value, such as
target.f2 = c(lower, upper)
,
then dissolution will be obtained by random searching and the calculated
will be within the range of lower and upper.
For example, you can set target.f2 = c(54.95, 55.04)
to have target
of 55. Since
should be normally reported without
decimal, in practice, this precision is enough. You might be able to do with
c(54.99, 55.01)
if you really need more precision. However, the narrower
the range, the longer it takes the function to run. With narrow range such as
c(54.999, 55.001)
, it is likely the program will fail. In such case,
provide single value to use optimization algorithm.
Arguments model.par.cv
, fix.fmax.cv
, and random.factor
are certain
numeric values used to better control the random generation of model
parameters. The default values should work in most scenarios. Those values
should be changed only when the function failed to return any value. Read
vignette of the function (vignette("sim.dp.byf2", package = "bootf2")
)
for more details.
The data frame sim.summary
in sim.dp.out
, the output of function
sim.dp()
, contains dp
, the population profile, and sim.mean
and
sim.median
, the mean and median profiles calculated with n.units
of
simulated individual profiles. All these three profiles could be used as the
target profile that the newly simulated profile will be compare to. Argument
sim.target
defines which of the three will be used: ref.pop
, ref.mean
,
and ref.median
correspond to dp
, sim.mean
and sim.median
,
respectively.
A list of 2 components: a data frame of model parameters and a
data frame of mean dissolution profile generated using the said model
parameters. The output can be passed to function sim.dp()
to further
simulate multiple individual profiles.
tp <- c(5, 10, 15, 20, 30, 45, 60) mod.par.r <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0, mdt = 25, mdt.cv = 4, beta = 2.1, beta.cv = 3) d.r <- sim.dp(tp, model.par = mod.par.r, seed = 100, n.units = 120L, plot = FALSE) model.par1 <- sim.dp.byf2(sim.dp.out = d.r, target.f2 = 60, seed = 123) model.par2 <- sim.dp.byf2(sim.dp.out = d.r, target.f2 = c(59.95, 60.04), seed = 123)
tp <- c(5, 10, 15, 20, 30, 45, 60) mod.par.r <- list(fmax = 100, fmax.cv = 2, tlag = 0, tlag.cv = 0, mdt = 25, mdt.cv = 4, beta = 2.1, beta.cv = 3) d.r <- sim.dp(tp, model.par = mod.par.r, seed = 100, n.units = 120L, plot = FALSE) model.par1 <- sim.dp.byf2(sim.dp.out = d.r, target.f2 = 60, seed = 123) model.par2 <- sim.dp.byf2(sim.dp.out = d.r, target.f2 = c(59.95, 60.04), seed = 123)