LUCIFER de Ambrosius Graal
EL SUEÑO DE LUCIFER
Fungi simulations were run separately in 3 different modes named (1) ‘Actual Data’, (2) ‘Gauss’, and (3) ‘Kernel’. Simulated data was recorded and stored for the monitoring and evaluation purposes. In this section, I compare measured data with recorded simulated data to validate simulation results.
The process of generating values for the fungi simulations can be imagined as (1) withdrawing numbers from a hat with replacement (‘Actual Data’ simulation mode), (2) withdrawing numbers from the area under the normal curve (‘Gauss’ simulation mode), and (3) withdrawing numbers from the area under the kernel curve (‘Kernel’ simulation mode). Method (1) gives the highest precision as the numbers withdrawn are those captured in the real-world experiments with Neurospora crassa, however, its drawback is that it cannot preserve any variability of a biological system. Method (2) is efficient computationally, time- effective and easy to apply for non-statisticians as withdrawing numbers at random from a Gaussian distribution is a very well established and automated method, however, its main drawback is that data for the majority of living systems is not distributed normally, or, often the distribution of the data is not known due to the small sample size. Method (3) is an accurate and precise one. It preserves variability present in all biological systems. However, it also possess some drawbacks: it requires significant computational power, it is not well established in education systems, and there is a tendency for noise appearing in the tails of kernel estimate when applied for the long-tailed distributions (this is because the kernel window size is always fixed for entire sample).
Figure 35, Figure 36, and Figure 37 are rough illustrations of the methods used for generating simulation values for apical extension velocities, branching angles, and branching distances in the fungi simulation program. I would like to highlight that the raw numerical data is discrete. Thus, the ‘Actual Data’ method is based on generating simulation values based on a discrete set of numbers while Gauss and Kernel methods are based on generating simulation values withdrawn from continuous data. In other words, values withdrawn from the area under the curve, even if not present in the original measurements (sample), are likely to appear in the measurement outcomes once the same measurements are repeated. In that respect withdrawing values from continuous data allow to imitate variability and stochastic components of living systems.
- 76 -
A
A
B
B
C
C
D
D
A
A
B
B
C
C
D
D
Figure 35 Generating apical extension velocity values for the computer simulations of filamentous fungus, Neurospora crassa: (A) withdrawing values from the area under the kernel curve, (B) withdrawing values from the are under the normal curve, (C) relative frequencies of the original data (for reference only), (D) withdrawing values from the original data with replacement. Relative frequency counts or histograms are used to estimate distribution of the data, based on the measurements made. This figure if for the illustration of the main statistical concepts only, proper kernel curves are given in the later part of this section
Figure 36 Generating branching angle values for the computer simulations of filamentous fungus, Neurospora crassa: (A) withdrawing values from the area under the kernel curve, (B) withdrawing values from the are under the normal curve, (C) relative frequencies of the original data (for reference only), (D) withdrawing values from the original data with replacement. Relative frequency counts or histograms are used to estimate distribution of the data, based on the measurements made. This figure if for the illustration of the main statistical concepts only, proper kernel curves are given in the later part of this section
- 77 -
To draw the kernel curves based on the data collected, I used the toolbox for kernel estimation written by Horova et al. (MATLAB toolbox, Department of Mathematics and Statistics, Masaryk University)).
Figures 41-58 below show histograms and associated kernel curves. The illustrations show data distributions of parametric values measured and then generated during computer simulations of the filamentous fungus, Neurospora crassa. The number of histogram bins was set up as 24 for every parameter. Original image files with associated numerical data can be found under the following link: Histograms and Kernel Curves of the key growth indicators of Neurospora crassa.
A
A
B
B
C
C
D
D
Figure 37 Generating branching distances values for the computer simulations of filamentous fungus, Neurospora crassa: (A) withdrawing values from the area under the kernel curve, (B) withdrawing values from the are under the normal curve, (C) relative frequencies of the original data (for reference only), (D) withdrawing values from the original data with replacement. Relative frequency counts or histograms are used to estimate distribution of the data, based on the measurements made. This figure if for the illustration of the main statistical concepts only, proper kernel curves are given in the later part of this section
- 78 -
Figure 40 Histogram of the apical extension velocities. The values used were generated in the ‘Actual Data’ simulation mode, for Neurospora crassa growing on an agar substrate.The number of histogram bins is 24.
Recorded values were normalised, so that they fall into the interval between 0 and 1, the interval is closed on the both sides, [0; 1].
Figure 41 Kernel curve for the apical extension velocities distribution. The values used to construct the curve were generated through running simulation in an ‘Actual Data’ mode.
Recorded values were normalised, so that they fall into the interval between 0 and 1, the interval is closed on the both sides, [0; 1]. Figure 39Kernel curve of the apical extension velocities distribution. The values were measured for Neurospora crassa growing on an agar substrate.
Recorded values were normalised, so that they fall into the interval between 0 and 1, the interval is closed on the both sides, [0; 1]. The curve is an alternative to the histogram in the Figure 38 above
Figure 38 Histogram of the apical extension velocities. The values used were measured for Neurospora crassa growing on an agar substrate.The number of histogram bins is 24.
Recorded values were normalised, so that they fall into the interval between 0 and 1, the interval is closed on the both sides, [0; 1].
- 79 -
Figure 45 Kernel curve for the apical extension velocities distribution. The values used to construct the curve were generated through running simulation in a ‘Kernel Mode’.
Recorded values were normalised, so that they fall into the interval between 0 and 1, the interval is closed on the both sides, [0; 1] Figure 44Histogram of the apical extension velocities. The values used to produce the histogram were generated through running simulation in a ‘Kernel Mode’.The number of histogram bins is 24.
Recorded values were normalised, so that they fall into the interval between 0 and 1, the interval is closed on the both sides, [0; 1]
Figure 43Kernel curve for the apical extension velocities distribution. The values used to produce the curve were generated through running simulation in a ‘Gauss Mode’.
Recorded values were normalised, so that they fall into the interval between 0 and 1, the interval is closed on the both sides, [0; 1].
Figure 42 Histogram of the apical extension velocities. The values used were generated through running the simulation in a ‘Gauss Mode’.The number of histogram bins is 24.
Recorded values were normalised, so that they fall into the interval between 0 and 1, the interval is closed on the both sides, [0; 1].
- 80 -
Figure 49 Kernel curve for the branching angles distribution. The values used to construct the curve were generated through running simulation in an ‘Actual Data’ mode.
Kernel curve on the left (negative values on the X axis) is for hyphae branching on the left side of the parent hypha, while the kernel curve on the right (positive values on the X axis) is for hyphae branching on the right side of the parent. Figure 48 Histogram of the branching angle values. The values used were generated through running simulation in an ‘Actual Data Mode’.The number of histogram bins is 24.
Histogram on the left is for hyphae branching on the left side of the parent hypha, while the histogram on the right is for hyphae branching on the right side of the parent
Figure 47 Kernel curve for the branching angle distribution of the values measured for Neurospora crassa growing on an agar substrate.
Kernel curve on the left (negative values on the X axis) is for hyphae branching on the left side of the parent hypha, while the kernel curve on the right (positive values on the X axis) is for hyphae branching on the right side of the parent.
Figure 46Histogram of the branching angle values. The values used to construct the histogram come from the data collected for Neurospora crassa growing on an agar substrate.The number of histogram bins is 24.
Histogram on the left is for hyphae branching on the left side of the parent hypha, while the histogram on the right is for hyphae branching on the right side of the parent
- 81 -
Figure 53 Kernel curve for the branching angles distribution. The values used to draw a curve were generated through running simulation in a ‘Kernel Mode’.
Curve on the left (negative values on the X axis) is for hyphae branching on the left side of the parent hypha, while the curve on the right (positive values on the X axis) is for hyphae branching on the right side of the parent
Figure 52 Histogram of the branching angle values. The values used to construct the histogram were generated through running simulation in a ‘Kernel Mode’. The number of histogram bins is 24.
Curve on the left (negative values on the X axis) is for hyphae branching on the left side of the parent hypha, while the curve on the right (positive values on the X axis) is for hyphae branching on the right side of the parent
Figure 51Kernel curve for the branching angles distribution. The values used to draw the curve were generated through running simulation in a ‘Gauss Mode’.
Curve on the left (negative values on the X axis) is for hyphae branching on the left side of the parent hypha, while the curve on the right (positive values on the X axis) is for hyphae branching on the right side of the parent Figure 50 Histogram of the branching angle values. The values used to construct the histogram were generated through running simulations in a ‘Gauss Mode’ The number of histogram bins is 24.
Histogram on the left is for hyphae branching on the left side of the parent hypha, while the histogram on the right is for hyphae branching on the right side of the parent
- 82 -
Figure 57 Kernel curve for the branching distances distribution. The values used to draw the curve were generated through running simulation in an ‘Actual Data’ mode.
The values in a long tail of the data distribution are not noise and are more likely to occur for parent hyphae. This property is implemented in the simulation program
Figure 56 Histogram of the branching distances. The values used were generated through running the simulation in an ‘Actual Data’ mode.The number of histogram bins is 24.
The values in a long tail of the data distribution are not noise and are more likely to occur for parent hyphae. This property is implemented in the simulation program
Figure 55 Kernel curve for the branching distances distribution. The values were measured for Neurospora crassa growing on an agar substrate.
The values in a long tail of the data distribution are not noise and are more likely to occur for parent hyphae. This property is implemented in the simulation program
Figure 54 Histogram of the branching distances. The values used were measured for Neurospora crassa growing on an agar substrate.The number of histogram bins is 24.
The values in a long tail of the data distribution are not noise and are more likely to occur for parent hyphae. This property is implemented in the simulation program
- 83 -
Figure 61 Kernel curve for the branching distances distribution for Neurospora crassa growing on an agar substrate. The values used to generate the curve were generated through running simulation in ‘Kernel’ Mode.
The values in a long tail of the data distribution are not noise and are more likely to occur for parent hyphae. This property is implemented in the simulation program.
Figure 60Histogram of the branching distances. The values used to construct the histogram were generated through running simulation in the ‘Kernel’ mode. The number of histogram bins is 24.
The values in a long tail of the data distribution are not noise and are more likely to occur for parent hyphae. This property is implemented in the simulation program
Figure 59 Kernel curve for the branching distances distribution. The values used to draw the curve were generated through running simulation in the ‘Gauss’ mode.
The long tail present for the measured data, as
well as for the data generated through ‘Actual Data’ or ‘Kernel’ Modes does not exist in the
kernel curve generated through using ‘Gauss’
Mode of the in silico simulation program. This
means this method does not ‘imitate’ the
proportions that occur in a real world, for Neurospora crassa.
Figure 58 Histogram of the branching distances. The values used to construct the histogram were generated through running simulation in a ‘Gauss Distribution’ mode. The number of histogram bins is 24.
The long tail present for the measured data, as well as for the data generated through ‘Actual Data’ or ‘Kernel’ Modes does not exist in the histogram generated through using ‘Gauss’
- 84 -
To further validate simulation results, I used overlays of the relative frequency counts of the measured and simulated data to evaluate which of the considered statistical approaches describes best Neurospora crassa phenotype.
Results showed that the majority of the generated values meet the logical conditions and resemble the observed data. Regarding precision, kernel method seems to be the best estimate. However, the method has one major drawback – generating negative values. To tackle this problem, I discarded negative values and took only the positive ones when executing the in silico fungi MATLAB simulation program. Therefore, even if kernel mode is not the most accurate one, it is still expected to give the best simulation results regarding simulating apical extension velocities of Neurospora crassa.
A
A
B
B
C
C
D
D
Figure 62 Apical extension velocities - comparison of the measured and simulated data: (A) Relative frequency count of the measured apical extension velocitiesfor Neurospora crassa growing on an agar substrate, (B)
comparison of (A) and the relative frequency count of the simulated values generated through the ‘Kernel’
mode, (C) comparison of (A) and the relative frequency count of the simulated values generated through the
‘Actual Data’ mode, (D) comparison of (A) and the relative frequency count of the simulated values generated
through the ‘Gauss’ mode.
It can be seen that the values generated in ‘Kernel’ mode imitate best the data distribution observed in the
laboratory setting. The minor drawback of the kernel method is the presence of the negative values as a result of the estimation. However, in silico fungi program discards any values that do not meet the logical conditions, therefore all negative values are discarded during execution of the program.
- 85 -
Comparison of the simulation results with the measured (observed) values showed that the ‘Kernel’ method best imitate branching angle values observed in the populations of
Neurospora crassa in the laboratory setting. Although, the ‘Gauss’ method also imitates data well, it misses the values in the tails of the distributions, which are observed in a real world and are not simply noise.
‘Actual Data’ method also replicate observational data well regarding values. However, it is the weakest when it comes to reflecting data proportions (giving the probabilities of occurrence of certain values). An additional conclusion from the analysis of the relative frequency counts presented above is that Neurospora crassa does not branch symmetrically. There is a higher probability of sending branch under more than 90° when the branch is sent on the left side. Therefore, even if sent branches extend simultaneously in various,
A
A
B
B
C
C
D
D
Figure 63 Branching angles – comparison of the measured and simulated data: (A) Relative frequency count of the measured branching anglesfor Neurospora crassa growing on an agar substrate, (B) comparison of (A)
and the relative frequency count of the simulated values generated through the ‘Kernel’ mode, (C) comparison
of (A) and the relative frequency count of the simulated values generated through the ‘Actual Data’ mode, (D)
comparison of (A) and the relative frequency count of the simulated values generated through the ‘Gauss’
mode.
It can be seen in the illustrations above that the values generated in ‘Kernel’ mode imitate best the data
distribution observed in the laboratory setting. Also, the ‘Gaussian’ method reflect the real world data well for
this case, however, it omits the extreme branching angle values that are present while Neurospora crassa forms
branches and forms a colony and therefore should not be neglected. The ‘Actual Data’ method seems to be the worst in terms of replicated data proportions, however it is still precise in terms of reflecting the values observed in the laboratory.
- 86 -
random directions, the colony biomass is expected to be shifted to the right as the branches extend and cross each other.
Comparative analysis of branching distances, for the simulated values versus measured ones, again, show the superiority of ‘Kernel’ estimation method. Firstly, the ‘Kernel’ method imitates best the data in the long tail of the distribution. Secondly, it reflects proportions between the data in the most precise way when compared with the alternative methods compared. Another interesting outcome of the visualization is a clear difference between an overlay for the ‘Actual Data’ mode, where values are discrete, and the two remaining modes in which the data is continuous. It can be seen in the Figure 64C that the relative frequency bins perfectly overlap, whereas in the Figure 64 C and D, some of the bins exist for the simulated data, but not for the measured one.
A
A
B
B
C
C
D
D
Figure 64 Branching distances – comparison of the measured and simulated data: (A) Relative frequency count of the measured apical branching distancesfor Neurospora crassa growing on an agar substrate, (B) comparison of (A) and the relative frequency count of the simulated values generated through the ‘Kernel’ mode, (C) comparison of (A) and the relative frequency count of the simulated values generated through the ‘Actual Data’
mode, (D) comparison of (A) and the relative frequency count of the simulated values generated through the
‘Gauss’ mode.
It can be seen that the values generated in ‘Kernel’ mode imitate best data distribution observed in the laboratory setting. Although the ‘Gaussian’ method seem to reflect the laboratory data quite well, when you look closely, it can be seen the distances between 100 and 200 m do not have such a high frequencies of occurrence in a real
world as the simulated values would suggest. Moreover, ‘Kernel’ method reflects better the data that is present
in the long tails of the distribution. Interestingly, data simulated in an ‘Actual Data’ mode, perfectly reflects the
positions of the bins for the histogram constructed for the measured data, what is also the validation of the in silico program and the statistical methods used. It should be kept in mind that, even if a statistical method does