Statistical applications in pharmaceutical and chemical field

by Riccardo Bonfichi

From 3 Batches to Continuous Confidence: How Monte Carlo & Bootstrap Turn Process Validation into Predictive Quality

06/23/2025

Abstract

Process-validation practice has moved from a one-off, three-batch exercise to a life-cycle discipline that must quantify risk from the first conformance lot to the last commercial shipment. Yet many pharmaceutical processes are still assessed with static capability indices or control-chart limits that say little about the future. This article shows how modern resampling techniques—Monte Carlo simulation and the bootstrap—convert scarce early data into predictive risk statements and continuously refine those statements as production data accumulate.
In Case Study 1 we take the minimal information available at Process Performance Qualification—a min-mode-max triplet from three lots—and, by simulating triangular, uniform, and heuristic-normal priors, estimate:
(i) the long-run proportion of out-of-specification lots,
(ii) a provisional Cpk with Monte Carlo uncertainty, and
(iii) worst-case assay limits at the 95% level.
In Case Study 2 we shift to Continued Process Verification, analyzing twenty additional commercial lots with a 5000-fold non-parametric bootstrap. The method yields a bias-corrected 95% confidence interval for Cpk and an upper bound on the true failure rate, both of which shrink as the lot count rises—providing an objective trigger for the next CPV review. Open-source R scripts are supplied in my public GitHub repository reproduce every figure and table https://github.com/rbonfichi/process-validation-simulation
The two examples demonstrate that resampling does not eliminate uncertainty, but it quantifies and progressively reduces it, turning backward-looking validation into forward-looking predictive assurance.
Monte Carlo and bootstrap outputs integrate seamlessly with traditional control charts, speak the quantitative language of ICH Q8–Q12 and FDA 2011 guidance, and can be automated within existing electronic batch-record systems.
Implementing these techniques therefore offers QA teams a practical, regulator-aligned path from static compliance to continuous, risk-based process verification.

Read more
Read more for Apple

Understanding Microbial Count Distributions:
Choosing the Right Model for Control Charts

02/24/2025

Abstract

Microbial count monitoring is a fundamental aspect of Quality Assurance in pharmaceutical production, cleanrooms, and other controlled environments. These data are discrete count variables, meaning they represent the number of microbial occurrences within a defined unit of time, area, or volume. Traditional statistical approaches often assume that microbial counts follow a Poisson distribution, where the mean equals the variance. However, real-world microbial data frequently exhibit overdispersion (variance greater than the mean) or zero inflation (excess zeros beyond Poisson expectations). Misidentifying the underlying distribution can lead to incorrect statistical inferences and inappropriate control limits, ultimately compromising process monitoring and regulatory compliance.
This article explores the limitations of the Poisson model in microbial data analysis and introduces more appropriate models, such as the Negative Binomial (NB) and Zero-Inflated models (ZIP, ZINB). The Negative Binomial distribution accounts for overdispersion, while Zero-Inflated models address the presence of excessive zeros in the dataset. The selection of an appropriate model is not just a theoretical concern, but it directly influences the effectiveness of control charts, which are used to detect microbial excursions and ensure that processes remain within acceptable limits.
A major focus of this study is the long-term impact of control chart selection. In pharmaceutical manufacturing, control limits derived from these charts are often used for Continued Process Verification (CPV). If an incorrect statistical model is applied, inaccurate control limits may be set, leading to either false alarms (unnecessary interventions) or missed process deviations. This has serious implications for compliance with regulatory guidelines and the overall reliability of microbial monitoring.
By leveraging goodness-of-fit tests, variance analysis, and real-world data, this article demonstrates how to select the most appropriate statistical model. The R scripts for the case studies mentioned above can be freely downloaded at https://github.com/rbonfichi/microbial-counts. The conclusions emphasize that there is no universal approach to microbial count monitoring, choosing the correct distribution and control chart is essential for both short-term monitoring and long-term process verification. Through this approach, practitioners can ensure that microbial control limits are scientifically justified, regulatory-compliant, and capable of supporting continuous quality improvement in pharmaceutical and controlled environments.

Read more
Read more for Apple

Chat GPT-4o: A powerful Tool to Quickly Identify Anomalous Lots in a Dataset

05/27/2024

Abstract

This article explores the application of ChatGPT-4o, an advanced artificial intelligence tool, in the field of pharmaceutical Quality Control. Using a dataset comprising analytical results from thirty-one production batches of a hypothetical active ingredient, the study demonstrates how ChatGPT-4o can quickly identify and efficiently interpret anomalies within complex datasets. Leveraging Principal Component Analysis (PCA), the AI not only identified anomalous batches, but also provided insights into the reasons behind such anomalies (for example, higher impurity levels or changes in solvent residues). Furthermore, the AI hypothesized potential problems in the production process or in the quality of raw materials, based on significant deviations observed in certain batches.
The results highlight the ability of artificial intelligence to make data interpretation easily accessible. However, the study highlights the importance of statistical knowledge to formulate detailed questions and understand the answers generated by artificial intelligence. Ultimately, ChatGPT-4o has proven to be a powerful tool for improving the efficiency and effectiveness of data review processes, such as those for Annual Product Quality Reviews (APQRs).

Read more
Read more for Apple

Monte Carlo method: a useful tool for the simulation of pharmaceutical processes

04/10/2024

Abstract

In the precision-driven world of pharmaceuticals, where safety and regulatory compliance are paramount, simulation methods stand out for their ability to predict and optimize complex processes. This post focuses on the key role of Monte Carlo simulations, a statistical method that transcends guesswork, providing a robust framework for decision making in the pharmaceutical industry.
The concept of the “Monte Carlo Method” and how it works are introduced using simple analogies. The introduction is completed with some historical data and the advantages/disadvantages of this approach.
Five case studies are then presented which refer, respectively, to five different operations/situations typical of the pharmaceutical industry:
. crystallization,
. production of an API
. micronization
. robustness of the analytical method
. stability studies
These examples, although greatly simplified for reasons of clarity, show well the practical usefulness and versatility of "Monte Carlo Simulations" in different scenarios in the pharmaceutical sector.
Each case study is illustrated with the help of graphs and the results are commented on since practical decisions then depend on them.
The R scripts for the case studies mentioned above can be freely downloaded at:
github.com/rbonfichi/montecarlo simulation
In conclusion, this post aims to demonstrate how, once again, statistical methods applied to pharmaceutical control and production improve the reliability and efficiency of processes while reducing costs.

Read more
Read more for Apple

Elements of Statistics for the Pharmaceutical Quality Control using Microsoft Excel®

02/02/2024

Abstract

Every day, in the field of pharmaceutical manufacturing and control, enormous quantities of data are produced, which remain, for the most part, underutilized. A statistical approach enables the transformation of such often disorganized data into useful information, facilitating a better understanding, utilization, and improvement of the processes that generated them.
Microsoft Excel® is undoubtedly the simplest, most widespread, and commonly used program for "data management" in companies, including those in the pharmaceutical field. Although it was not created and developed for specific applications in the statistical domain, Excel® allows for significant achievements if its full potential is exploited.
The purpose of these slides is to demonstrate, through simple but meaningful examples, how even with this almost "zero cost" program, it is possible to extract a wealth of information from your data and, who knows, perhaps even spark a passion for statistics. Indeed, once the enormous potential of this discipline is understood, and with a small investment, the desire to delve deeper and transition to more specific software can become a natural progression.
In order to stimulate this interest and show how much Excel®, despite its intrinsic limitations, can offer, numerous common examples from daily production and control practice have been compiled. Application examples include assessments and decision-making related to trends in analytical parameters and yields, the impact of process parameters on production, deviation investigations, out-of-specification (OOS)/out-of-trend (OOT) results, supplier validation, and more.

Read more
Read more for Apple

Continued Process Verification: a Practical Approach

01/10/2024

Abstract

Continued (or Ongoing) Process Verification is a structured approach that allows a company to monitor the production process and make the necessary changes to the process and/or control strategy, as appropriate.
According to Dr. Shewhart, all manufacturing processes, and therefore also chemical-pharmaceutical and pharmaceutical ones, show a "controlled" or "natural" variability, to which is often added an "uncontrolled variability" attributable to so-called “special” causes.
Furthermore, all manufacturing processes tend to deviate from "ideal conditions" in which only "natural variability" characterizes them.
It is precisely for these reasons that since 2011 the FDA, and since 2015 the EMA, strengthened by the ICH Q8, Q9, Q10, Q11 and Q12 guidelines, have encouraged manufacturers of both APIs and finished pharmaceutical forms to adequately control the variability of the manufacturing processes throughout their life cycle in order to prevent dangerous deviations in the quality of the finished product.
In a nutshell, this is the meaning of Continued Process Verification (abbreviated as CPV) discussed in the FDA guideline on Process Validation (2011) and Annex 15 of Eudralex Vol. IV (2015).
Starting from the indications contained in these documents, the following slides show, through simple but significant examples and using the appropriate statistical tools, how it is possible to deal with the CPV in practice.

Read more
Read more for Apple

Bootstrap using R: a useful approach for handling chunky data

09/04/2023

Abstract

The term 'chunky data' was coined by Dr. Wheeler in the 90s to describe data that has been measured "in increments too large for the task at hand" or that result from "rounding or truncating experimental measurements". This type of data often occurs when experimental values must be reported in compliance with pre-established specifications that perhaps do not require decimal digits, or at most, only one. In the case of time series data that are naturally similar to each other (e.g., Annual Product Quality Reviews), and in the absence of decimal digits that differentiate them, it is common to find values that are repeated many times identically. This type of data, which can be clearly visualized using a probability plot or an individual value plot, leads to a substantial reduction in the variability of the dataset. As a result, a dataset may not follow a normal distribution, even though there is no scientific reason for this deviation. However, the non-normality of the datasets can represent an obstacle to the application of certain statistical tests that require the normality of the data.
A simple way to eliminate the problem caused by chunky data is to repeat the measurements with suitable tools or to report the measurements with the decimal places eliminated during rounding. Unfortunately, this is often not feasible, such as when comparing measurements from two laboratories that have used different data reporting criteria. In these cases, the absence of normality makes it impossible to correctly apply those statistical tests that are commonly used, for example, to compare the means and dispersions of two data series (e.g., Two-sample t-test or F-test for Equal Variances).
Bootstrapping, a nonparametric resampling technique, serves as an effective and easy-to-implement alternative to non-parametric tests (e.g., Mann-Whitney) for handling such data. Bootstrapping allows for the creation of many simulated samples from a single dataset, without making assumptions about the data's distribution. This technique can help in estimating the distribution of a population and can be used to make inferences about the mean and variance differences between two datasets, even when one or both are not normally distributed. This post demonstrates how to use a simple R script to implement a specific bootstrapping method, providing a quick and reliable solution.
Clearly, the approach presented here can be extended to compare two non-normally distributed datasets for reasons beyond the presence of chunky data. Typical examples include analytical parameters (e.g., related substances content) or critical process parameters that are naturally "limited" (the impurity content can never be less than zero) or arbitrarily constrained and are not normally distributed.

Read more
Read more for Apple

Applied Statistics for QA & QC in a GMP environment

01/30/2023

Abstract

In the 2011 FDA guideline on Process Validation, the term "statistical" was already used 13 times and the message to pharmaceutical manufacturers was clear: use quantitative statistical methods whenever possible to keep processes under control so as to ensure their stability over time and consistency with initial validation.
The concept of "Continued Process Verification", introduced by the FDA Guidance on Process Validation, was subsequently also taken up by Eudralex's Annex 15 "Qualification and Validation", published in 2015, which also recommended that “statistical tools should be used, where appropriate, to support any conclusions with regard to the variability and capability of a given process and ensure a state of control”.
Other important regulatory documents published later (ICH Q10 and ICHQ12) have further reaffirmed the importance of using statistical tools and not only to better define the processes control strategy, but also to design them adequately (Design of Experiments, Design Space, etc.) and all this with a view to reducing any post approval changes.
From all this, not only the multiple uses of statistical tools are evident, but also their strong practical impact.
The slides attached here, and which were used for a two-day webinar held in June 2022, present, with a structured approach, numerous quantitative statistical tools applied to pharmaceutical manufacturing and control. Given the vastness of the subject, this material obviously cannot cover all topics. However, it provides an overview that should encourage the adoption of these tools, if only for the advantages, including economic ones, that they offer.

Read more
Read more for Apple

Elements of Acceptance Sampling by Attributes

09/20/2021

Abstract

The need to verify whether a material supplied by a producer to a consumer, or by a department to another of the same company, corresponds to pre-established requirements, requires a set of statistical techniques that are called acceptance control. In general, the acceptance control can be carried out “by attributes” or “by variables” and is mainly used to establish whether the lots subjected to the control can be accepted or rejected, not to determine their quality level. This post focuses on acceptance control by attributes and the quality of the lot is measured by its percentage of defects. The three main schemes of sampling plan (i.e., hypergeometric, binomial and poissonian) are discussed and practical application examples are presented. The “control by attributes” is then considered from the process standpoint using the appropriate control charts (i.e., p or np-charts and c or u-charts). The analysis of the topic is completed by a discussion of the ISO 2859-1 standard with some practical application examples. The ISO 2859-1 standard specifies an acceptance sampling system for inspection by attributes indexed in terms of Acceptance Quality Limit (AQL).
The ultimate purpose of this post is to draw attention to the fact that although sampling plans are challenging to design and implement, they can perform a much higher function than just "police control". The information they return is indeed invaluable and is a real waste of resources if, as often happens, it is simply filed and ignored.

Read more
Read more for Apple

How to extend the shelf life of an API ? Look at its Stability Data from a Multivariate standpoint !

04/12/2021

Abstract

Stability studies are mandatory activities that, in general, are routinely conducted and equally routinely monitored as per official guidelines.
The traditional approach to stability studies is limited exclusively to recording the occurrence of a degradation process with the sole purpose of estimating a possible shelf life for the product. The objective is achieved by following the trend over time of a quantitative attribute, usually the assay value.
This approach, due to its univariate nature, is however unable to say anything about the possible causes of the degradation phenomenon and therefore suggest a way to improve things.
Since at each stability time point other quality attributes are also determined beside assay (e.g., pH, water content, etc.), the adoption of a new perspective, i.e., a multivariate approach, allows to identify those parameters, among that are measured, that most influence the degradation process. This allows us to hypothesize improvement actions on the process aimed at reducing, if not even minimizing, degradation and therefore, ultimately, extending the shelf life of the product itself.
In this post, stability data obtained under "accelerated conditions" were chosen as a case study precisely because, being available before the others (i.e., long term), they allow the degradation process to be investigated immediately.
Experimentally it was also observed that even with only the data of the third month it was possible to obtain a model similar to that obtained with the data of the sixth month. It is therefore reasonable to assume that the use of additional accelerated aging techniques (e.g., 40°C ≤ T ≤ 80°C and 10% ≤ RH ≤ 75%) will make the data available for analysis in an even shorter time frame.

Read more
Read more for Apple

ASEPTIC FILLING OF STERILE POWDERS: SOME ELEMENTS OF STATISTICAL PROCESS CONTROL AND PREVENTIVE MAINTENANCE

02/22/2021

Abstract

A precise and accurate dosing of sterile powders under aseptic conditions in vials still represents a challenge in the pharmaceutical field and this is even more true when it comes to small quantities of high-potency active substances.
To conduct this important operation of the pharmaceutical industry effectively and efficiently, microdosing machines are available that can fill up to over 20,000 vials per hour.
Among the various filling methods available, the one that uses a vacuum / pressure system is very popular.
The discs of the microdosing machine, and the chambers contained therein, are subjected to a continuous operational stress which leads to an inevitable deterioration of their performance.
To what extent is this deterioration acceptable?
When should preventive actions be taken to limit it?
These questions are answered by the Descriptive Statistics which, thanks to a simple summary index, the coefficient of variation, allows to compare the variability of each dosing chamber over time, build a case history, set limits of acceptability and then indicate when it is time to intervene in a preventive way.
Furthermore, the statistical methods allow us to go into even more detail of the filling process, modeling it and verifying its consistency between the different dosing chambers and over time.
It is worth noting that the approach and methods presented here are applicable to similar processes, at least in some respects, such as compression to produce tablets, etc.

Read more
Read more for Apple

PRINCIPAL COMPONENT ANALYSIS AND CLUSTER ANALYSIS AS STATISTICAL TOOLS FOR A MULTIVARIATE CHARACTERIZATION OF PHARMACEUTICAL RAW MATERIALS

12/14/2020

Abstract

Numerous factors contribute to the variability of the pharmaceutical industry processes and among these the raw materials play a primary role as they often come from different sources that use different production processes.
Raw materials characterization therefore plays a fundamental role in terms of Quality which, by its nature, is "the enemy of variability".
Multivariate Statistical Analysis of Data (MVDA), beyond of its complex mathematical, is here presented as a powerful and practical tool for the study and classification of raw materials.
Thanks to the use of multivariate techniques such as Principal Component Analysis (PCA) or Cluster Analysis (CA), it is possible to graphically represent each lot, defined by the values of the different analytical parameters that characterize it, as a point in a Cartesian diagram whose coordinates are the principal components. Since these components are built to intercept the variability in the data, these graphs reveal characteristics which would escape other types of surveys and therefore allow to catalog the lots based on the degree of intrinsic homogeneity that defines them and identify any anomalous behavior. This approach can therefore be used both initially, to characterize the incoming raw materials, and subsequently, in the case of any anomalies, to see how the raw materials of the batches under investigation were located compared to those that had not given problems.
The techniques that have been detailed here can also be extended to other typical situations in the pharmaceutical industry such as, for instance:
• comparative evaluation of finished product lots, for example for the purposes of Annual Product Quality Review (APQR).
• comparative evaluation of series of measurements performed by different operators, etc.
Once again, statistical methods show how it is possible to "simplify complexity" and extract practical and "ready-to-use" knowledge from complex datasets by capturing their information content.

Read more
Read more for Apple

MULTIPLE LINEAR REGRESSION: A POWERFUL STATISTICAL TOOL TO UNDERSTAND AND IMPROVE APIs MANUFACTURING PROCESSES

10/26/2020

Abstract

It is known that, over time, all production processes tend to deviate from their initial conditions, and this happens because of many different reasons such as changes in materials, personnel, environment, etc.
This variability in the processes, which often goes unnoticed, is instead well intercepted by the data that Quality Control systematically collects for batch release purposes.
If these data are analyzed using Multiple Linear Regression (MLR), they reveal a lot regarding the manufacturing processes that generated them.
This product knowledge is of great practical use to the Company as it allows to:
• understand which are the parameters that most affect the product quality and how they interact with each other,
• establish whether the parameters that are controlled are really the ones we need or, instead, which ones would be better to consider,
• define / improve a product control strategy based on experimental data and quantitative models rather than speculation,
• define and graphically represent the design space (ICH Q8) inherent to the production process considered,
• identify possible ways to improve process performance and scientifically pilot this improvement,
• mitigate the Regulatory impact in case of changes.
In this post is detailed, step by step, how this ready-to-use process knowledge can be obtained from experimental data easily available.

Read more
Read more for Apple

QUALITY METRICS AND DATA CONSISTENCY – Part 2

08/01/2020

Abstract

This second part is the continuation and completion of the previous one.
In this second post the points dealt with are:

CASE STUDY 3:
Capability Analysis: metrics for stable/mature processes (Cp and Cpk) and metrics for new processes (Pp and Ppk)
CASE STUDIES 4/5:
Probabilistic methods for a quick evaluation of the manufacturing process (Standardized Normal distribution, Poisson and Binomial distributions)
CASE STUDY 6:
- Processes non-normally distributed: impurities content, microbial counts, Particle Size Distributions (PSD), black particles (or black specs)
- Normalization of non-normal data using mathematical transformations (logarithm, square root, reverse or reciprocal)
- Johnson Transformations
CASE STUDY 7:
Multivariate methods: a different way to look at Quality Control data!
CONCLUSIONS:
- Quality Metrics are ease of use quantitative indicators that allow to intercept the variability of products / processes, quantify it and therefore ensure Quality.
- Quality Metrics provide a “quantitative knowledge” of the process that allows to manage (preventing or justifying them) anomalous events (OOT, OOS, deviations, etc.) and communicate awareness in what is done and reliability in the processes used.
All this is summed up in two words: ECONOMIC ADVANTAGE!