You may be familiar with Design of Experiments (DoE) approaches to develop your bioprocess? You vary some process parameters, such as temperature, pH and feed, and want to regress their effect on the product performance or product quality. But you also experienced that there is a lot of variability, which you cannot explain?

**What are Random Effects?**

When defining and training a data-driven, statistical model, learned parameters are commonly assumed to converge on a certain point, given that other parameters do not change. There might be some residual variability associated with such a point depending on the number of samples used to train the model but in principle it represents a fixed property of the modelled process. Those parameters are called fixed effects. An example for a fixed effect might be the temperature involved in a chemical reaction.

Random effects, in contrast, are parameters that are random and uncontrollable by nature. Conceptionally they are not assumed to converge on the same value for repeated experiments and take on a constant value only for a specific subset of the data, called a “block”. To stay with the example of a chemical reaction: Different raw materials might be used for different sets of experiments. It is not known how those raw materials affect the model, but it would be a sound assumption that they influence the model prediction, or at least the variability of the response in some way. In most cases random blocks are represented as categorical variables in the model data (e.g., “Block A”, “Block B”, etc.).

**Why are Random Effects Important?**

Incorporating random effects into regression models results in a more accurate and conservative estimate of model variability. In the context of biopharmaceutical manufacturing, this means that the distribution of a critical quality attribute in the drug substance computed by a model also contains the contribution of random effects. A wider distribution can potentially impact the out-of-specification probability of a production process and consequently patient safety. In some cases, the impact of random effects is larger than even the fixed effect (Figure 1).

**Handling Blocking Effects**

Arguably the most unfortunate choice when handling blocking or random effects is to ignore them entirely when modelling a process. This, of course, can have a large impact on both prediction and measures of variability. Blocks oftentimes affect the model significantly and omitting this information reduces its accuracy and applicability to real-world scenarios.

A method routinely used in the industry is to incorporate random blocks as categorical fixed effects into the model. Block information is encoded in a way that enables the prediction of responses for an average block, i.e., the average of all blocking effects used for training the model (this is called “sum deviation encoding”). The advantage of this method is that blocks are considered when training the model and that their effects are quantified. It also enables to determine the significance of a block, i.e., if there is statistical evidence for even including the factor in the model. However, the drawback of modelling random blocks as fixed effects is that their impact on variability is underestimated, as their random nature is not taken into account. Figure 2 illustrates this effect by contrasting a model that includes the random effect (LMM) with a model that does not (OLS).

The resolve this problem a special type of model can be employed that is specifically designed to handle both fixed and random effects: the linear mixed model. Blocks are considered as random variables and their impact on variability is computed independently from the fixed effect variability. Taking the sum of all those variance components gives a more accurate account of total model variability. This value is usually larger than the model variability calculated by the previously described method where blocks are modelled as fixed effects.

**Benefits and Drawbacks of Linear Mixed Models**

In most cases, linear mixed models (LMM) should be the tool of choice when random effects are involved in a process model. As described above, their representation of variability gives a more accurate account of reality. Additionally, LMMs can be used to model more complex interactions between blocks, for example parallel or nested blocking structures, which is not possible in ordinary least squares models.

Overall, neither computation time nor limitations due to numeric optimizations should discourage the use of LMMs when used in a modern computing environment or toolbox. They provide a more accurate description of variance and constitute a flexible tool to model hierarchical blocking structures. Consequently, they are increasingly popular when characterizing processes, especially in the biopharmaceutical context where the prediction of distributions and out-of-specification probabilities is critical.

We cooperate with Werum IT-Solutions to provide the technology and methods of mixed models for process characterization. As part of the integrated process model (IPM) that models a manufacturing process spanning multiple operations, the variance introduced by a random effect in a certain step can be propagated through all consecutive steps until its influence can be seen in the final drug substance. As part of this ensemble model, LMMs are used in conjunction with OLS and mechanistic models to adapt to specific properties of each manufacturing step. In the end, a statistically sound description of a critical quality attribute’s distribution can be simulated by employing all significant sources of variation.