AXIOMAS HERMETICOS - Curso completo de Magia Negra

In life sciences, a phenomenon is understood as a random event the single outcomes of which are uncertain, but still predictable in a large number of repetitions (Baldi and Moore, 2014, Moore et al., 2012). Whereas, it is important to differentiate between possibility and probability (Riffenburgh, 2012). The possibility can be 1 when a phenomenon can occur, and 0, when a phenomenon cannot occur. Probability describes how likely the phenomenon will occur in a long series of events (Riffenburgh, 2012, Baldi and Moore, 2014). If a phenomenon cannot occur, then its probability is 0. Inversely, if a phenomenon can occur with 100% certainty, then its probability is 1. Thus, the probability range is always between 0 and 1 (Riffenburgh, 2012). Computer simulations of Neurospora crassa growth patterns imitate the random behavior of this organism. Estimating real-world probabilities for

Neurospora crassa would require hundreds of thousands of experiments. In that respect, computer simulations are useful as they allow for analysis of long series of events as well as considering various case scenarios. Also, computer simulations allow for more accurate and precise prediction of outcomes compared to the standard laboratory observations that give only a rough estimate of associated probabilities (Baldi and Moore, 2014). The number of observed occurrences within a particular range divided by the number of all occurrences is known as a relative frequency and is analogical to the definition of probability, where the known possibilities are calculated rather than occurrences. Consequently, relative frequencies can be viewed as probability estimators. Analogically, relative frequency distributions estimate probability distributions (Riffenburgh, 2012). In life sciences, it is often desired to estimate how likely phenomenon is to occur without any prior knowledge of its probability distribution. It is often impossible to get the full numerical information, e.g. for the filamentous fungi investigated in this study, to know the probability denominator, I would have to include all branching Neurospora crassa in the world. It is much more realistic to find relative frequencies of occurrence and estimate these probabilities. Suppose I wish to know the probability of a branching event as a function of branching distance. I can estimate it by dividing the number of branching events observed, e.g. 4, by the number of micrometers recorded, e.g. 287. Consequently, a relative frequency of 0.0139372882 is an estimate of the branching probability.

- 24 -

As the number of the observations increases, the relative frequency estimate is expected closer to the branching probability (Riffenburgh, 2012, Moore et al., 2012, Baldi and Moore, 2014, Wang and Bakhai, 2006). However, because branching events occur at random, it is possible to get occasionally a series of branching events whose relative frequency goes away the branching probability. Branching frequency will always converge with the probability for a large number of counts. It is possible to plot the distribution of relative frequencies by counting the number of occurrences in every category and dividing the count by the total sample size. In a relative frequency graph, the range of the vertical axis is between 0 and 1. As the variables measured for Neurospora crassa are continuous, the categories in the frequency plots are not straightforward and have to be specified. For example, the number of apical extension velocities recorded for Neurospora crassa growing on agar in a laboratory conditions is approximately 100 000. This allows me for reducing the interval size from 7m/min to 0.7 m/min, even to 0.07 m/min. As the interval size is reduced towards 0, a smooth curve is approached. This curve approximates the probability distribution of the apical extension velocity more precisely. Importantly, no natural intervals exist for continuous random variables (Riffenburgh, 2012, Baldi and Moore, 2014, Moore et al., 2012, Wang and Bakhai, 2006, Crawley, 2013, Crawley, 2005, Dytham, 2011, Kabacoff, 2011). Therefore, as the number of zero-width intervals reaches infinity, their probability distributions will be fully smooth. The probability distributions will have their central tendency and spread, which is the measure of variability. A frequency distribution can be considered as the data that is distributed along the axis of the particular variable and is a concept that allows for the basic preliminary analysis in biological sciences. When the specific number of values falls into defined bin (interval) is called a frequency for that bin. Often the frequency itself is not enough for making inferences and, therefore, it is more appropriate to use a relative frequency that is defined as “the proportion of the sample in a bin” (Riffenburgh, 2012, Baldi and Moore, 2014, Moore et al., 2012). For example, the proportion of apical extension velocities between 15 m/min to 20 m/min for Neurospora crassa parent hyphae is 367/1925= 0.19. The proportion of hyphae with velocities below 15

m/min is 203/1925= 0.11. For reporting purposes, these numbers can be expressed as percent of the proportion, which are 19% and 11% respectively. Regarding distribution patterns, they will vary for small samples and become more stable as the sample size (number of measurements) increases. Relative frequency calculated above is the characteristics of the sample. However, the primary interest is to obtain the characteristic of the whole population of hyphae – probability distribution. That way I can tell, e.g. what is the probability that any

- 25 -

randomly chosen parent hypha of Neurospora crassa in Canada will extend with particular apical extension velocity. This probability will be an area under the probability distribution curve divided by the total area. As it is not possible to measure the apical extension velocity for the whole population of Neurospora crassa in Canada, therefore I use the information from the samples provided by prof. Roger Lew (Lew, 2015). Relative frequency counts estimate the probability distribution. If I took another sample from a different laboratory, the relative frequencies would be slightly different, but they still would be expected to fall into the similar range of frequencies. I could use the parametric statistics to differentiate between population (stable) characteristics and sample (varying) characteristics (Moore et al., 2012, Baldi and Moore, 2014). The parametric methods should give the information on how well the estimate approximates the parameter. However, parametric methods can be used only when data meets certain assumptions, and therefore sometimes it is better to use non-parametric methods instead. Another important distinction when describing phenotype data is its discreteness vs. continuity. For example, in a discrete probability plot there is one position for every possibility, with the height ranging between 0 and 1, and the sum of all of these possibilities is equal 1. Whereas, in a continuous probability plot a smooth curve whose height ranges from 0 to 1 represents the data, and the area under this curve is equal 1. In the next subchapters, I discuss in more depth the features of probability distributions, as well as the reasons why I used nonparametric statistical methods to describe Neurospora crassa

phenotype.

In document Curso completo de Magia Negra (página 186-189)