by Riccardo Bonfichi
PRINCIPAL COMPONENT ANALYSIS AND CLUSTER ANALYSIS AS STATISTICAL TOOLS FOR A MULTIVARIATE CHARACTERIZATION OF PHARMACEUTICAL RAW MATERIALS
AbstractNumerous factors contribute to the variability of the pharmaceutical industry processes and among these the raw materials play a primary role as they often come from different sources that use different production processes.
MULTIPLE LINEAR REGRESSION: A POWERFUL STATISTICAL TOOL TO UNDERSTAND AND IMPROVE APIs MANUFACTURING PROCESSES
AbstractIt is known that, over time, all production processes tend to deviate from their initial conditions, and this happens because of many different reasons such as changes in materials, personnel, environment, etc.
QUALITY METRICS AND DATA CONSISTENCY – Part 2
AbstractThis second part is the continuation and completion of the previous one.
QUALITY METRICS AND DATA CONSISTENCY – Part 1
AbstractIn 2002, FDA launched the “Pharmaceutical cGMPs for the 21st Century” initiative with the aim of promoting a modern production approach, risk- and science-based. In 2015, always in that context, FDA asked the industry for inputs to define a “FDA Quality Metrics program” and in December 2019 announced that the implementation of a “Quality Metrics Program” has become a priority. Taking its cue from these FDA stimuli, this post and the next deal with the use of quantitative tools (or Quality Metrics) for understanding, monitoring and possibly improving pharmaceutical manufacturing processes. Real case studies that show the practical application of Quality Metrics to typical QA / QC topics are discussed and their statistical analysis detailed step by step. In practice it is shown how, from data normally available at the company, it is possible to easily extract useful information on the state of the processes and, above all, predict their possible outcome. It is exactly this combination of two aspects, one descriptive and the other predictive, which allows to really know a given process, control it and possibly even improve it. This knowledge is also useful for managing issues like OOS, OOT, deviations, etc. In fact, a poor knowledge of the process and of its quality indicators can lead to consider anomalous what is not. Given the number of Quality Metrics considered and the breadth of the case studies discussed, the topic was splitted in two parts. In this first post the points dealt with are:
Basics of Statistical Risk Analysis
AbstractRisk is an essential part of daily life and even the society, as a whole, needs to take risks to continue growing and developing. Risk management is the process of identifying, analyzing and responding to risk factors. According to ICH Q9, Risk Assessment consists of the identification of hazards and the analysis and evaluation of risks associated with exposure to those hazards. Apart from a few exceptions (e.g., quantitative FTA), most of the risk analysis tools commonly used in the pharmaceutical field (e.g., FMEA, etc.) are basically subjective. However, in some cases, there are statistical techniques that allow us to assess the extent of the risk associated with some decisions. A typical example of this is, for example, the decision regarding the conformity, or not, of a lot based on the analysis of a sample of it. In such a decision two figures must be considered, the PRODUCER and the CUSTOMER (or CONSUMER), who run two different types of risk. The PRODUCER runs the risk of rejecting a “good lot” while the CUSTOMER (or CONSUMER) that of accepting a “not compliant” or a “poor quality” product. This post briefly addresses this topic.
Regulatory Technical Writing - Labor Ergo Scribo!
AbstractThose who work must necessarily write! The aims are many: to communicate the results of one's studies, to give operating instructions, to respond to requests, etc. In all cases, however, if the message contained in the writing does not reach the recipient, the entire communication process is frustrated and the consequences of this can be significant. For this purpose, it is sufficient to think that at least a third of the time of an executive is spent in writing documents and that the quality of a given job, the choice to continue it, interrupt it, finance it, etc. are often determined solely by the document that illustrates it! The focus of this presentation is therefore to analyze the structure of a technical document and provide practical suggestions for its preparation. Writing, however, is still much more than this and therefore the presentation considers, more generally, the "what it means to write and how to do it".
Solvents Classification using a Multivariate Approach: Cluster Analysis.
AbstractThis post continues and completes the analysis of a database consisting of 64 solvents, each described by eight physico-chemical descriptors, initiated in the previous post. Subject matter of this study is the application of Cluster Analysis with the intention of finding groups in data, i.e., identifying which observations are alike and categorize them in groups, or clusters. As clustering is a broad set of techniques, this study focuses just on the so-called hard clustering methods, i.e., those assigning observations with similar properties to the same group and dissimilar data points to different groups. Two types of algorithms have been considered: hierarchical and partitional. Quite apart from the chosen technique, the experimental evidence indicates the presence, in the database, of: • three main groups, each consisting of individuals categorized as similar among them and • a few isolated individuals dissimilar from the others. A similar finding was also obtained in the previous post using 2d-contour plots. A closer examination of these three main groups of solvent shows a finer structure consisting of smaller groups of individuals highly similar among them (e.g., members of a given chemical family (e.g., alcohols, chlorinated hydrocarbons) or of chemical entities sharing common characteristics (e.g., aprotic dipolar solvents).
Solvents Classification using a Multivariate Approach: Correlation and Principal Component Data Analysis.
AbstractThe identification of data-driven criteria to make a conscious choice of solvents for practical applications is a rather old issue in the chemical field. Solvents, in fact, are mainly selected based on Chemist’s experience and intuition driven by parameters such as polarity, basicity and acidity. At least two research groups, already in 1985, approached the issue of solvent selection using multivariate statistical methods. These Scientists, using different databases, each based on different types of physicochemical descriptors, obtained different classification patterns. In this post, it has been chosen one of those databases and the data analysis process has been repeated detailing it systematically. This post deals with the first part of the process and it covers the intercorrelation among the physicochemical descriptors used to characterize the solvents under study and Principal Component Analysis. The correlation found allows to capture 70% of the initial data variability just using two principal components the first of which is related to “polarity/polarizability” and “lipophilicity” of molecules and the second to “strength of intermolecular forces”. The use of these two principal components suggests the possibility of grouping solvents into aggregates (or clusters) of similar individuals and this aspect will be covered in the following post.
A different way to look at pharmaceutical Quality Control data: multivariate instead of univariate.
AbstractIn the pharmaceutical industry, Quality Control (QC) data are typically arranged in data tables each row of which refers to a specific production lot and contains the results from different types of measurements (chemical and microbiological). As for each active chemical entity, or dosage form, there is a specific data table and since all lots listed therein are manufactured using the same approved process, the data table contains the “analytical fingerprint” of that specific manufacturing process. In spite of their table form, QC data are usually reviewed, evaluated and trended in a univariate mode, i.e., each type of data is analyzed individually using statistical tools such as control charts, box plots, etc. The dataset is therefore studied “ by columns ”. In this post, it is proposed a different way to analyze QC data, i.e., by using a multivariate approach that improves upon separate univariate analyses of each variable by using information about the relationships between the variables. Moreover, the combination of multivariate methods with the power of the programming language R and its unsurpassed graphic tools, allows analyzing data mainly relying on graphics and, as stated by Chambers et al., “there is no statistical tool that is as powerful as a well-chosen graph”. This post shows how using R for combined multivariate data analysis and visualization, the information contained in QC chemical dataset can be easily extracted and converted into “knowledge ready to use”.