Kinetic and bioinformatics characterization of cellulases from the metagenomic library of GEBIX specific to African oil palm (Elaeis guineensis Jacq.) empty fruit bunch

Texto completo

(1)Kinetic and bioinformatics characterization of cellulases from the metagenomic library of GEBIX specific to African oil palm (Elaeis guineensis Jacq.) empty fruit bunch. LAURA MARCELA PALMA MEDINA. UNIVERSIDAD DE LOS ANDES FACULTY OF ENGINEERING CHEMICAL ENGINEERING DEPARTMENT BOGOTÁ D.C. 2012.

(2) Kinetic and bioinformatics characterization of cellulases from the metagenomic library of GEBIX specific to African oil palm (Elaeis guineensis Jacq.) empty fruit bunch. LAURA MARCELA PALMA MEDINA. Thesis submitted in fulfillment of the requirements for the degree of Master of Science. Thesis advisor ANDRÉS FERNANDO GONZÁLEZ BARRIOS PhD.. UNIVERSIDAD DE LOS ANDES FACULTY OF ENGINEERING CHEMICAL ENGINEERING DEPARTMENT BOGOTÁ D.C. 2012.

(3) ACKNOWLEDGEMENT. The support of many people made this thesis succesful, I would like to appreciate their collaborations to this project.. I would like to thank Andrés González for all the knowledge, patience and dedication that he gave to me, my learning process and the development of this project.. I also thank Diana Catalina Ardila Montoya, whose support in the project was of great importance.. I deeply thank Maria Mercedes Zambrano and Cesar Osorio for all the lessons and their constant support.. To all my professors specially Watson Vargas, Rocio Sierra and Felipe Muñoz.. I thank to my family and friends for the support in the hard moments and their support to my decisions.. To my laboratory companions and department staff, specially Luis Medina and Viviana Ferreira for their ideas and constant help at the laboratory stages.. Thanks to the Chemical engineering department and Corpogen for allowing me the use of their installations and equipment, and for providing the metagenomic library that was used to obtain the studied clones. Thanks to LAMFU, specially to its head Silvia Restrepo, for the finance of the sequentiation and experimental help..

(4) TABLE OF CONTENTS. 1.. ABSTRACT. 2.. .. .. .. .. .. .. .. .. 9. STATED PROBLEM. .. .. .. .. .. .. .. .. 10. 3.. OBJETIVES. .. .. .. .. .. .. .. .. 12. 3.1. General Objective. .. .. .. .. .. .. .. .. 12. 3.2. Specific Objectives. .. .. .. .. .. .. .. .. 12. 4.. STATE OF ART. .. .. .. .. .. .. .. .. 13. 4.1. Biofuels importance. .. .. .. .. .. .. .. .. 13. 4.2. African Oil Palm as a substrate for biofuels production. .. .. .. 14. 4.3. Lignocellulose composition. 4.3.1 Cellulose. .. .. .. .. .. .. .. .. .. 15. .. .. .. .. .. .. .. .. .. 15. 4.3.2 Hemicellulose .. .. .. .. .. .. .. .. .. 16. 4.3.3 Lignin. .. .. .. .. .. .. .. .. 16. .. .. .. .. .. 16. .. .. .. .. .. .. 17. .. .. .. .. .. .. 18. 4.4. .. .. Degradation activity of cellulases. 4.4.1 Families of cellulases .. .. 4.4.2 Producer organisms of cellulases.. .. 4.5. Reaction Kinetics models for cellulases. .. .. .. .. .. 19. 4.6. The Metagenomic approach. .. .. .. .. .. .. 20. 4.6.1 Cellulases metagenomic studies. .. .. .. .. .. .. 21. 4.6.2 GEBIX metagenomic library. .. .. .. .. .. .. 22. .. .. .. 23. 4.7. .. .. Bioinformatics tools for analysis of cellulases sequences. 4.7.1 Sequences alignments. .. .. .. .. .. .. 23. 4.7.1.1 Algorithms for sequences comparison. .. .. .. .. .. 24. 4.7.2. .. .. .. .. .. 25. .. .. .. .. .. 26. Molecular modeling of Cellulases. .. .. 4.7.2.1 Protein structure prediction algorithm. -4-.

(5) 4.7.2.2 Molecular docking fundamentals. .. .. .. .. .. .. 29. 5.. METHODOLOGY. .. .. .. .. .. 31. 5.1. Oil palm empty fruit bunch pretreatment. .. .. .. .. .. 31. 5.2. Selection of solid medium for degradation recognition. .. .. .. 31. 5.3. Screening on Liquid médium of the metagenomic library. .. .. .. 31. 5.4. Selection of clones with oil palm empty fruit bunch degradation capacity. .. 32. 5.5. Effect of metal ion, pH and temperature over the cellulase activity .. .. 33. 5.6. Assays for kinetic model adjusment. 5.7. Determination of molecular weight range by Ultrafiltration. 5.8 5.9. .. .. .. .. .. .. .. .. .. 33. .. .. .. 34. Fosmids sequentiation and recognition of cellulase sequences. .. .. 34. Protein structure prediction of found cellulases.. .. .. .. .. 35. .. .. .. .. .. 35. 5.11 Comparison with other cellulase sequences .. .. .. .. .. 36. 6.. RESULTS AND DISCUSSION .. .. .. .. .. .. .. 37. 6.1. Library screening. .. .. .. .. .. .. 37. 6.2. Selection of clones with oil palm empty fruit bunch degradation capacity. .. 38. 6.3. Effect of metal ion, pH and temperature over the cellulase activity. .. .. 40. 6.4. Kinetic model adjusment.. 6.5. Determination of molecular weight range by Ultrafiltration. 6.6. 5.10 Molecular docking of found cellulases .. .. .. .. .. .. 42. .. .. .. 44. Fosmids sequentiation and recognition of cellulase sequences. .. .. 45. 6.7. Structure modeling and molecular docking. .. .. .. .. .. 46. 6.8. Comparison with other cellulase sequences .. .. .. .. .. 54. 7.. CONCLUSIONS. .. .. .. .. .. .. .. .. 55. 8.. RECOMENDATIONS .. .. .. .. .. .. .. .. 57. .. .. .. .. .. .. .. .. 58. SUPPLEMENTAL MATERIAL. .. .. .. .. .. .. .. 68. REFERENCES. .. .. .. -5-. .. ..

(6) LIST OF FIGURES. Figure 1. pCC1FOS Map (CopyControl®, w.d.). .. .. .. .. .. Figure 2. Preparation of a metagenomic library with fosmids (CopyControl®, w.d.).. 22 23. Figure 3. Selected media stained with Congo Red, the yellow halos represent the areas were the CMC was degraded.. .. .. .. .. .. .. .. 37. Figure 4. Growth curves (semi-log) of the 13 colonies that showed cellulolytic activity on solid media. .. .. .. .. .. .. .. .. .. 38. Figure 5. Measurements of reducing sugars on media by DNS method. The ten colonies were selected from the growth curves. 7 colonies show degradation of OPEFB, but only 5 generate considerable amount of sugars. .. .. .. .. .. .. 39. Figure 6. HPLC result for clone 13. The peak at 10.374 minutes was recognized as Cellubiose and the peak at 15.297 could be xylose, one of the components of the hemicellulose. The results for the other three clones were similar .. .. .. 40. Figure 7. Effect of Metal ions (A), pH (B) and Temperature (C) on the degradation of OPEFB after two hours of reaction. Reducing sugars were measured by phenol-sulfuric acid method .. .. .. .. .. .. .. .. .. .. 41. Figure 8. Experimental results for kinetic assays of colony 4 without metal ion (A), colony 4 with addition of metal ion (B), colony 8 without metal ion (C) and colony with addition of metal ion (D). All the curves were adjusted to a Fractal kinetics model (Continuous lines). The percentages are the quantity of OPEFB on each reaction.. .. .. .. 43. Figure 9. Results of molecular Docking of 3XQXA (A,B), contig 791(C,D) and contig 2669 (E,F). Ligand: Polysaccharide of five glucoses.. -6-. .. .. .. .. .. 49.

(7) Figure 10. Results of Molecular Docking of 2XHYA (A-B), contig 1847 (C-D) and contig 6490 (E-F). Ligand: Cellobiose.. .. .. .. .. .. .. .. 51. Figure 11. Results of Molecular Docking of 2XHYA (A-B), contig 1847 (C-D) and contig 6490 (E-F). Ligand: Polysaccharide of five glucoses... .. .. .. .. 52. Figure 12. Results of Molecular Docking of 3UT0A (A-B) and contig 3141 (C-D). Ligand: Cellobiose. .. .. .. .. .. .. .. .. .. .. 53. Figure 13. Results of Molecular Docking of 3UT0A (A-B) and contig 3141 (C-D). Ligand: Polysaccharide of five glucoses.. .. .. -7-. .. .. .. .. .. 53.

(8) LIST OF TABLES. Table 1. Composition of wastes of oil palm production. Percentages of celulose, hemicelulose and lignine . (González et al., 2008) .. .. .. .. .. 15. Table 2. Fractal dimension and kinetic constant for the reaction made with protein extract of colonies 4 and 8. The reaction were carried out with and without ion metal. .. 44. Table 3. Quantity of reducing sugars after 2 hours reaction with the solutions after Ultrafiltration .. .. .. .. .. .. .. .. .. .. 45. Table 4. Concentration of proteins in the final solutions after ultrafiltration.. .. 45. Table 5. Quantity of proteins in the final solutions after Ultrafiltration. .. 45. .. Table 6. Identification of domains in each contig that had blast results related with cellulases.. .. .. .. .. .. .. .. .. .. .. 46. Table 7. Results of 3D modeled structure quality check. The results were obtained on Swiss PDB platform ®.. .. .. .. .. .. .. .. .. 47. Table 8. Results of molecular docking simulations .. .. .. .. .. 48. Table 9. Aminoacids conserved on sequences of cellulases produced by other organisms, which had interactions with the ligand. .. -8-. .. .. .. .. .. 54.

(9) 1.. ABSTRACT. Biofuels is a field that has attracted different researchers around the globe as an alternative to fossil fuels, therefore new ways of obtaining them are necessary. In the current study we obtained from the metagenomic library of GEBIX (Centro Colombiano de Genómica y Bioinformática) cellulases that have high rates of degradation of cellulose from the empty fruit bunch of the African oil palm (Elaeis guineensis Jacq.).. Looking for these cellulases, screening of the library was carried out in solid and liquid media, finding 13 possible colonies with cellulolytic activity. Subsequently, the degradation of oil palm empty fruit bunch in growth media was measure by colorimetric assays and high performance liquid chromatography, giving as result the presence of reducing sugars that indicates empty fruit bunch degradation. Two colonies were selected for determination of favorable degradation conditions. The pH and temperature ranges are between 4 to 8.5 and 30°C to 60°C, respectively; these conditions were similar to the already reported. Cellulose degradation increasing was detected by addition of potassium to the reaction with one of the protein extracts. More production of sugars was detected for reactions of colony 4 with metal ion addition and colony 8 without salt addition at pH 4 – 50°C and pH 8.5 - 40°C, respectively. Fractal-like kinetic model was determined for reactions, the fractal dimension value was found to depend on the substrate initial concentration, this vary in a rage of 0.12 to 0.38 which correspond with the range of values reported for cellulases and other enzymes.. Simultaneously, the inserts were sequenced in order to find cellulase sequences. Three glycosyl hydrolases families were found on five contigs and its structure were modeled by homology and energy minimization, giving as a result candidate structures with acceptable quality for found cellulases. Finally, the possible active site of each protein and the relevant aminoacids for each reaction were recognized carrying out molecular docking. Asparagine, Arginine, Glutamine, Glutamic acid and Aspartic acid are some of the aminoacids found in probable active sites. Most of the aminoacids bonded with ligands are conserved in other ten homologous sequences, indicating their possible fundamental role in the degradation reaction. Those relevant non-conserved aminoacids in active sites could be important for specific degradation of oil palm empty fruit bunch. -9-.

(10) 2.. STATED PROBLEM. Nowadays, it is considered as necessary to elaborate new and sustainable alternatives to obtain energy, mainly due to the reduction in the amount of fossil fuel available and the increasing demand for energy sources in every single country as a requirement for their economic growth. Also, relevant topics like the environmental conservation and the Greenhouse Effect caused by fossil fuel derivatives have been discussed as matters of highest relevance all over the world, leading the nations to consider cooperative agreements whose main goal would be the reduction in the emission of gases proven to cause the Greenhouse Effect, taking as a main target the gases obtained from the combustion of fossil fuels like: Carbon dioxide, methane, nitrogen oxides and chlorofluorocarbons (Acosta, 2004).. Throughout the search for renewable energy sources, different options have been suggested like eolic energy, solar energy, microbian cells and biofuel. To Colombia, being a country with a so-called strong agroindustrial focus, the latter is considered the most viable one. So, from the early years of the current decade, the legislation resolved that all kind of fuel sold to vehicles must be a mixture of traditional fuel obtained from oil and fuel grade alcohol, as a cheaper and environmentally friendly solution to the reduction in the – projected- national oil reserves. From 2001, in Colombia the fuel grade alcohol that has been consumed by vehicle users came from sugarcane, sugar beet and manioc (Ministerio de Agricultura, w.d.).. However, a discussion emerged about the sources used for the production of biofuel because these are products included in the basket of basic goods and their price gets increased by the continuous rise in the fraction of the total harvest that is assigned to the production of ethanol, leaving a shrunken fraction available as food at a higher price. Therefore, new sources of biofuel have been searched, being among them the biomass as a source of second generation fuels.. Nowadays, in Colombia the extraction of Palm oil from the African palm is a recognized process because of the diverse uses that can be given to this product. Colombia is considered one of the main producers worldwide, which leads to suggest using it to - 10 -.

(11) produce biodiesel but recycle the residual waste from the oil extraction as a valuable source of lignocellulose (González, 2008).. Since the lignocellulosic material should be pretreated by acids, it is desired to find proteins that degraded cellulose material to reducing sugars and could resist low pH conditions, this would reduce the cost of the bioethanol production process. These cellulases are expected to be produced by extremophile organisms. Therefore, it is desired to find in a metagenomic library provided by GEBIX (Centro Colombiano de Genómica y Bioinformática) clones that produce cellulases capable of degrading lignocellulose material into reducing sugars, specifically the lignocellulose found in the OPEFB of the African palm. The mentioned clones were constructed from a metagenome of an environment where extremophile organisms could be found, which leads to expect proteins that have an acceptable behavior under process conditions.. - 11 -.

(12) 3.. 3.1. To. OBJECTIVES. General Objective. characterize. cellulases from. GEBIX. (Centro. Colombiano. de. Genómica. y. Bioinformatica) metagenomic library by determining their kinetic, cellulase family, 3D structure and preferred orientation with ligand.. 3.2. 3.2.1. Specific Objectives. To determinate presence of cellulases on an extreme environment metagenomic library with affinity to african oil palm empty fruit bunch.. 3.2.2. To determinate favorable conditions for cellulose degradation reactions and its kinetics.. 3.2.3. To establish a protocol to carry out concentration of proteins.. 3.2.4. To carry out bioinformatics analysis to determinate belonging to a family of cellulases.. 3.2.5. To predict the tertiary structure from the aminoacid sequences, as well as to determinate the protein-substrate complex structure by molecular docking.. - 12 -.

(13) 4. 4.1. STATE OF ART. Biofuels importance. Biofuels are fuels obtained from biological and renewable sources which had an increase in their use, trade and production in recent years. This comes with the need of using new renewable energy sources capable of replacing the fossil fuels. Biofuels also are being developed because of their environmentally friendly nature. These reduced emission of gasses that cause Greenhouse Effect as a result of their combustion. This is because they contain 35% less oxygen which allows a smaller emission of nitrogen oxides, as well as a higher vaporization rate compared to the standard fuels obtained from fossil fuels. Nowadays, two biofuels are widely recognized for their use worldwide: biodiesel and bioethanol. The latter may be obtained from three different processes: fermentation of organic composites, molecular segregation and cellulose hydrolysis (Bhattarai K. et al 2011).. Bioethanol has no content of sulfur and aromatic composites like biodiesel; hence, using it as a 10% fraction in common fuel allows a reduction in the emission of carbon monoxide between 22 and 50 percent, considering also that its use reduces emission of total hydrocarbon (íbid). The United States its main producer of this fuel, followed by Brazil and China.. A discussion has emerged regarding the sources used for the biofuel production because these products are included in the basic food basket (e.g., sugarcane, sugar beet and corn) and in consequence their price increases by the continuous rise in the fraction of the total harvest assigned to the production of ethanol. Therefore, new sources of biofuel have been of interest for both academia and industry, being among them biomass as a source of second generation fuels (Banerjee A., 2011).. Given the reduction in the available fossil oil sources, the Colombian government have been implementing in the last few years a plan focused to encourage and support the development of biofuels, foreseeing a future lack of fossil oil and a - 13 -.

(14) massive (as well as expensive) oil importation. Taking into account the strong agroindustrial focus in Colombia’s industry, the production of biofuels also seeks to extend the available market, as well as diversify the agricultural production. These led the Government to emphasize this as one of the main politics through the agricultural, environmental and energetic sectors of economy. Since 2001 different laws and decrees have been promulgated in relation to the aids, exemptions and benefits aimed to create a beneficial environment for the production of biofuels (Fedepalma, 2007) A part of these politics is to increase the cultivable lands assigned to grow the raw materials for biofuel production, considering this as a way to successfully replace illegal crops while providing wellness to the peasants.. 4.2. African Oil Palm as a substrate for biofuels production The African palm oil is used for several purposes with close relation to everyday’s life as it is suitable for human consumption and its derivatives have become essential raw materials for the production of soaps, laundry detergent, inks, among many others. This palm is a warm climate typical kind, mainly found in the departments of Meta and Cesar, with a period of 20 to 25 years between field renewals.. Nowadays, in Colombia the extraction of palm oil from the African palm is a recognized process because of the diverse uses that can be given to it. Colombia is considered one of the main producers worldwide, which leads one to suggest that the residual waste from the oil extraction could be a valuable source of lignocellulose which is one of the alternatives proposed for biofuel production (González-Barrios, 2008). It was reported by Umikalson et al. (1997) that fresh fruit bunch contains: 21% of oil palm, 27% of water and 52% of wastes. 44.2% of these wastes is empty fruit bunch (OPEFB). Previous studies (Umikalsom. et al., 1997; González et al., 2008 ) have. characterized the residual waste obtained while extracting oil from the African palm. It was found that the part called cuesco was the less suitable source because it showed high contents of lignin and low levels of hemicelluloses, meanwhile the fiber has intermediate levels of those and the OPEFB was found to have a high - 14 -.

(15) percentage of hemicellulose and cellulose, being considered as the most promising and suitable source for the ethanol production among side products for palm oil extraction (Table 1). In order to produce bioethanol it is necessary to carry out the saccharification process of the cellulose, which could be made with cellulases. Table 1. Composition of wastes of oil palm production. Percentages of celulose, hemicelulose and lignine . (González et al., 2008) Residue Empty Fruit Bunch. Fiber Cuesco. 4.3. %Celulose. %Hemicelulose. %Lignine. 44,97 ± 0,44 46,77 ± 5,39 33,21 ± 0,02 30,28 ±0,14 -. 19,92 ± 0,40 17,92 ± 4,89 16,47 ± 1,31 16,58 ± 0,06 11,29 ± 1,17 12,72 ± 0,05 6,68 ± 0,94. 10,23 ± 0,08 4,15 ± 0,53 23,47 ± 1,8 21,79 ± 0,01 43,56 ± 5,2 49,58 ± 0,15 57,31 ± 1,28. Lignocellulose Composition. 4.3.1 Cellulose This is a structural macromolecule found in plants, it consist of long chains of βglucoses. It is considered as the most abundant molecule on nature. This polymer has no ramifications, presenting chains that join to one another forming fibrils in a sort of net that shapes resembling a plate, giving strength to the structure and, despite being related to water it is not soluble in it or in most of the organic solvents. The bonds between glucoses are β-1,4-glucosidic type with hydrogen bonds between chains (Berg, 2001; Lehninger,2005; Boyer,2000).. The formula of cellulose is (C6H10O5)n where n goes from 500 to 5000 depending on the precedence, containing between 10000 and 15000 glucoses. The difficulty to degrade cellulose lies in its hydrogen and Van der Waals bonds to other structures like lignin and hemicellulose, which make it more resistant. The cellulose can be hydrolyzed using concentrated acids, hence, the developed process to transform cellulose into ethanol includes a pretreatment including a chemical treatment that. - 15 -.

(16) allows the removal of lignin and hemicelluloses with a reduction of cellulose. (Umikalsom et al., 1997). 4.3.2 Hemicellulose Like the cellulose, constitutes the structure of plants, is formed by long chains and β1, 4 bonds. However, these have as main monomers: xilose, glucose, fructose and galactose. This chain is also formed by polysaccharides with ramifications. Usually the hemicellulose is surrounding the cellulose (Lehninger, 2005).. This particular molecule is easier to hydrolyze with concentrated acids compared to the cellulose molecule because of its ramifications, which is the reason why this method is used in the pretreatment. It also can be found that this molecule has shorter chains compared to the ones found in cellulose.. 4.3.3 Lignin. This molecule together with cellulose, constitute the wood. It is formed by aromatic alcohols, making it ramified and shapeless. Its main goal is to fill the gaps between cellulose and hemicellulose and it cannot be hydrolyzed down to its original monomers, increasing the complexity of the hydrolysis process for cellulose and hemicellulose, required in the pretreatment.. 4.4. Degradation activity of cellulases. The main goal of this enzyme is to hydrolyze cellulose down to its glucose monomers. It is mainly produced by fungus and bacteria, common cause of decomposition in logs. There are three kinds of cellulases: endoglucanase, exoglucanase and β-Glucosidases (Lehninger, 2005).. Endoglucanases can get bonded to any point in the cellulose chain which is why their main function is to divide long chains of cellulose into sets of shorter ones. Exoglucanases are able to attack long chains starting from their ends, that is the - 16 -.

(17) reason why this cellulose breaks short chains into cellubiose (two glucose molecules joint by a β-1,4 bound), a molecule that is catalyzed by β-Glucosidases to obtain the glucose monomers as final products (Lynd L.R. et al, 2002).. To carry out the correct degradation of cellulose into reducing sugars, the presence of the three types of celluloses or cellulosomes is necessary. Cellulosomes are complexes of those enzymes that are difficult to separate without them losing their functionality. The first case is more common on fungi and it is also known that they produce more quantity of cellulases (more than 20 g/l), in contrast it is usual that bacteria produces cellulosomes and lower amount of cellulases (less than 0.1 g/l) (Knowles, 1987).. Cellulases have three parts: one catalytic site, a tail that recognizes the cellulose also called carbohydrate-binding module and a linker that joint the other two parts. The principal function of the tail it is to get the complex close to de cellulase and to facilitate its hydrolysis (Knowles, 1987) (Lynd et al, 2002).. 4.4.1 Families of cellulases. Cellulases are part of the glycosyl hydrolases group and are the type of enzymes of this group that have a clear classification by sequence similarity. It is expected that every member of a family had a similar catalytic mechanism and because of that it is also expect that the tertiary structure is similar into the members of the same family, not excluding that some enzymes of different families maybe have similar three dimensional folding (Henrissat. 1993; Claeyssens, 1992). Furthermore, it is possible to find the same microorganism cellulases that belong to different families and so they could have big differences in tertiary structure, sequence and/or catalytic site (Lynd, 2002).. Within a family it is possible to compare sequences in the catalytic site and it is expected that the Aspartic acid and Glutamic acid residues are conserved because they are commonly found to be catalytic in this group of enzymes. (Henrrissat, 1993). - 17 -.

(18) Nowadays, there are around 130000 reported cellulase sequences on CarbohydrateActive Enzymes (CAZy) Data Base. These are classified in more than 130 families and 14 Clans.. 4.4.2 Producer organisms of cellulases. There is a large variety of organisms that are well known that produces cellulases or cellulosomes. Some of those among the bacteria kingdom are: Acetivibrio cellulolyticus, Agrobacterium ATCC 21400, Bacillus amyloliquefaciens, Bacillus circulans, Bacillus subtilis, Bacteroides cellulosolvents, Bacteroides succinogenes, Cellumonas. fimi,. Cellulomonas. uda,. Clostridium. cellulovorans,. Clostridium. cellobioparum, Clostridium thermocellum, Escherichia adecarboxylata, Erwinia chysantemi,. Microbispora. bispora,. Ruminococcus. albus,. Ruminococcus. flavefaciens, Thermonospora sp. There are also some organisms on the fungi kingdom: Aspergillus niger, Candida pelliculosa, Chaetonium globosum, Humicola insolensFusarium solani, Kluyveromycas fragilis, Neocallimastix frontalis, Penicillium pinophilum, Piromonas communis, Phanerochaete chrysosporium, Schizophyllum commune, Sphaeromonas communis, Sporotrichum pulverulentum, Sporotrichum thermophile, Talaromyces emersonii, Thermoascus aurantiacus, Trichoderma konongii, Trichoderma reesei, Trichoderma viride. Some of these organisms are thermophiles and that is the reason why they are important for industries, because they can resist high temperatures required for some processes. In the same way there are some of those that are anaerobic or aerobics making them specific or preferable for some processes. (Bayer, 2008; Knowles, et al.,1987; Eriksson, 1981; Ryu and Mandels, 1980; Ovando-Chacón, 2005) One of the most important organisms in the industry is T. reesei because it is stable in stirred reactors at pH 4.8, 50°C for 48 hours or longer, also it is resistant to chemical inhibitors. Nevertheless this organism does not digest lignin and have a low specific activity (Ryu and Mandels, 1980). C. thermocellum is one of the microorganisms well known for production of ethanol from lignocelluloses, it produces a cellulosome but it is not secreted, so the ability of the cells to attach to the lignocellulotic material it is an important factor of the process. (Bayer, 2008) - 18 -.

(19) Another microorganism important for this ability is the genus Bacillus. It does not produce great quantities of β-glycosidase but have a resistance to be inhibited by glucose. (Ovando-Chacón, 2005). Nevertheless since each cellulase has a high specificity to certain substrates, it is desired to find cellulases that are specific to OPEFB, which could also be stable at wide ranges of temperature or pH.. 4.5. Reaction kinetics models for cellulases. The enzymes, as cellulases, operate like a catalyst in organic reactions; in this case cellulases degrade large chains of cellulose to short chains. These types of reactions are commonly modeled like a Michaelis-Menten reaction. This model is based in the supposition that the reaction have two parts, first that the enzyme binds to the substrate making a complex and then this formation of the product. In this model a stationary state is supposed (Equation 1). Where is the concentration of substrate and. is the maximum velocity,. is the michaelis-menten constant.. Eq-1. Nevertheless a lot of studies in cellulases showed that in some reactions this model is not applicable and It is necessary to use other models. Bansal et al. (2009) classifies these in 4 classes: Empirical models, Michaelis-Menten based models, adsorption. in cellulose hydrolysis models and. Models on. soluble. cello-. oligosaccharides. Depending on experimental factors and type of substrate each model may be applicable.. Most of reactions with not soluble substrates, as OPEFB, are described as heterogeneous (Arantes and Saddler, 2010) because of the nature of the substrate. For this type of reaction two transport phenomena are relevant: Diffusion and adsorption of cellulase on cellulose, usually these factors will control the reaction. This effect is visible in the change of the production rate of product though time. Since in a microscopic level the substrate matrix could be described by Fractal models, it was proposed to modeled this kinetic taking into account the fractal. - 19 -.

(20) dimension, which will represent the influence of these transport phenomena into the reaction velocity (Kopelman, 1988). Based on these, some semiempirical models were applied to reaction with OPEFB. Väljamäe et al. (2003) proposed the Equation 2 as an approximation to describe the behavior of the reaction.. . . P(t )  [S ]0 1  exp  kt(1 h). . Eq. 2. Where [S ]0 is the initial concentration of substrate, k is the kinetic constant and h is the fractal dimension. The kinetic constant represents the affinity and velocity of the reaction and the fractal dimension represents the influence of the transport phenomena into the kinetics of the reaction As it was previously reported (Väljamäe et al., 2003), the constant h changes with the variation of the initial substrate quantity, this model associates one k constant for each extract and one h for each reaction.. 4.6. The Metagenomic approach. Until the end of the last century it was only possible to analyze the genome of microorganisms that were cultivable. However due to the need to study the microbial diversity, which cannot be accessed by traditional methods of cultivation, a new approach was developed: metagenomics. This new methodology is the genome analysis of a microbial community. The whole extracted DNA obtained from an environment can be analyzed regardless of the organism which it is for.. Taking into account that this approximation does not require previous knowledge of the species whose genome is gathered, it became a way to discover new enzymes and applications attracting the attention of the industry as a way to innovate on technologies and processes.. Currently, there are three different methodologies for the screening of metagenomic libraries. The first approach is the sequencing of libraries, which, due to recent developments in sequencing techniques is the most robust and powerful. - 20 -.

(21) methodology. However this method is highly expensive and the sequence assembly process can be tiresome.. Another approach is the hybridization of sequences, this procedure seeks a cellulase amino acid chain conserved within the fosmid inserts from the library. If an insertion is positive it passes to functional analysis or sequencing.. Finally, the functional screening methodology is usually used to search more specific proteins or antibiotics; it is a fast method but cannot be as effective as the others because it requires that the insert includes the complete sequence of the enzyme, moreover if the clone is not capable of expressing this enzyme it cannot be detected properly. (Schloss y Handelsman, 2003; Cowan, 2005).. 4.6.1 Cellulases metagenomic studies. In the last few years the use of metagenomic libraries to find enzymes has increased, specifically cellulases. The first cellulase found with this method was an endoglucanase by Healy (1995), after that study, it was not until 2003 that another research found a new cellulase (Rees et al., 2003). Currently there exist 105 new cellulases that were discovered because of studies in metagenomics, 60 of these were from soil samples (Duan, 2010).. The recollection of samples in cellulose metagenomic studies are mostly from extreme environments, because for industry it is important that the novel cellulases resist to wide ranges of temperature and/or pH. That’s why even some studies measure how the activities of those celluloses change when they are exposed to chemicals common in industries where these are use, also they test the cellulases at different pH ant temperature conditions (Voget, 2006; Liu, 2010).. It is also important to characterize the novel cellulases and that is why the sequence of these are well know and also some studies made bioinformatics analyzes, doing sequence alignment, and associating the cellulases with and specific family (Wang, 2008). There only a few studies where a functional screening is made as a prior - 21 -.

(22) method to find cellulases. Nevertheless, the majority of those studies have to obtain the sequences of the cellulases at the end to compare them to data bases (Rees, 2003).. 4.6.2 GEBIX metagenomic library. The metagenomic library that is explored in this project was build and provided by Centro Colombiano de Genómica y Bioinformatica (GEBIX). This library was made with sequences extracted from Los Nevados National Park in Colombia, the recollection of soil samples was carried out at different points at two conditions: wet and dry weather, the clones that were tested correspond to the samples took at high andean forest zones. This library was selected because the soil samples were covered by lignocellulosic material and also the environment conditions increase the possibility to found cellulases produced by extremophile organisms. Los Nevados has a higher point of 5321 m over the sea with a temperature between -3°C and 11°C, proportionating extreme conditions of humidity, temperature and radiation.. The whole genome of the soil samples was inserted on pCC1 fosmids (Figue 1) with a capacity of approximately 40 kb, this vector has resistance to cloranphenicol. The metagenome is constituted of 18432 clones of Escherichia coli EPI 300 that have the fosmids (Figure 2). The ends of each insert were sequenced in order to verify the independency of all clones.. Figure 1. pCC1FOS Map (CopyControl®, w.d.).. - 22 -.

(23) Figure 2. Preparation of a metagenomic library with fosmids (CopyControl ®, w.d.).. 4.7. Bioinformatics tools for analysis of cellulases sequences. 4.7.1 Sequences alignments. Bioinformatics is a tool to analyze great bulk of information, this could be useful to analyze sequences through comparison with data bases, some of these are: GenBank/NCBI (National Center of Biotechnology Information), EMBL-Bank/EBI (European Bioinformatics Institute) and DDBJ (DNA Data Bank of Japan). Sequence Alignment allows to characterize an unknown read, through comparison the function of this reads could be annotated and the possible type of gene could be determinated. This tool also permits to find homologous sequences and to carry out phylogenetic analysis (Todaka, 2010; Mount, 2004).. Since this tool proportionate information about conservative aminoacids in cellulases sequences, this tool was useful for the classification of glycosyl hydrolases into families (Henrissat and Bairoch, 1993) as well to modeled Hidden Markov Models for the recognition of each type of cellulase. Even thought cellulases are very diverse the comparison with data bases sequences will help to identify the possible aminoacids important for the specific reaction of cellulases with oil palm empty fruit bunch.. - 23 -.

(24) The cellulases are diverse in sequence and conserved sites are difficult to found, there are some studies (Knowles, 1987; Todaka, 2010; Bioinformatica de Celulasas, 2008) where sequences of aminoacids that conform some cellulases are compared, finding sites of conservation; nevertheless these sites have a little number of aminoacids. Knowles (1987) found that the enzymes produced from the same species had high percentage of conservation in the linker and tail sequences.. 4.7.1.1 Algorithms for sequences comparison. There are two ways to do sequence alignments: paired alignments, where there are only two sequences compared and multiple alignments where the compared sequences are more than two. The first one is useful when to characterize specific parts of a sequence, the latter one is preferable to compare sequences that have similar length and there is a previous notion of similar function.. There are two algorithms that are popular to do sequence alignment: Smith and Waterman (1981) and Needleman and Wunsch (1969). The first one was made to do local alignment and the second one to make global alignments. Both methods use matrixes of size n x m, where n and m are the sizes of sequences of different letters, the elements of those matrixes are numbers whose significance is the level of similarity between the letter on i and the letter on j. For the second method it is possible to put negative numbers when the letters are different, for the first method this is no possible leaving that spaces as zero. There are some standard matrixes to use for example: PAM, GONNET and BLOSUM; each of those had also subclasses depending on the type of search required. To find the best alignment, the sum og the numbers associated with each pair of letters is maximized.. BLAST is a platform supported by NCBI used to carry out local alignments of query reads and all the previously reported sequences on data bases.. To carry out. multiplealignment it is necessary to use programs like MEGAN (Metagenome Analyzer Software), Clustal X and MUSCLE; these programs use different algorithms similar to the previously described. An advantage of these programs is that they also permit to do phylogenetic analyzes (Edgar, 2004; Misener 2000). - 24 -.

(25) 4.7.2 Molecular modeling of Cellulases The genotypic footprint of the cells could determine the structure, function and behavior of the living organisms. It is important to make efforts to get to know the native structure of proteins because these offers a starting point to understand the nature of protein interactions, as well as the mechanisms by which the proteins act.. The process to establish tertiary structures on proteins is done experimentally using X-ray crystallography and NMR, however, the generation rate for new protein sequences exceeds by far the average rate by which the native conformation of proteins is determined experimentally, especially because of the elevated costs and prolonged periods of time required by such techniques.. Therefore, the determination of tertiary structures using molecular modeling represents an acceptable approximation as a mean to infer functions for new proteins as well as protein-protein and protein-binding interactions (Xu et al., 2007).. Molecular docking, as a complement of the conventional experiments, allows the prediction of tertiary structures of proteins while forming a complex with different ligands. This approximation use as a premise that the minimal energy complex will be the most probable. This method also permits to predict the important aminoacids in the interaction of the protein and the ligand. Nowadays, there are reports (Kim et al., 2006; Yui et al, 2010; Kumar et al., 2011; Shiiba et al., 2012) of modeled glycosyl hydrolases structures which its interaction with different ligands were evaluated in order to deeply study its sequences and functionality. The results for those studies were similar to enzymes of the same family with structures predicted by experimental techniques, also the collected information is useful for further characterization of each family of cellulase. Thus, the utility of these methods for molecular modeling of cellulases should be considered for high throughput generation of knowledge.. - 25 -.

(26) 4.7.2.1 Protein structure prediction algorithm. Different approaches can be found in order to predict the structure of proteins, approaches that can be organized in three kinds: Comparative modeling, Protein threading and ab initio.. In comparative modeling, the tertiary structure of a protein is predicted by comparing its amino acid sequence with sequences whose tertiary structures are already known. This methodology is based on the premise that similarities in the sequence imply similarities in the structure (Floudas et. al, 2006).. Protein threading is a method that is based on the premise that the number of the different structures is more limited than the number of sequences generated from genome projects. This approximation finds a correct alignment between the target sequence and a known structure in a folding library. Ab initio states that the native structure of a protein is associated to the global minimum free energy under certain conditions. This methodology is applied through the search for a conformation corresponding to the global minimum of a potential energy function. Ab initio solves the tertiary structure starting from the amino acid sequence; however, it is limited to the work on small proteins because it is much more intensive in the use of computational resources (Leach, 2001).. The fragment-base methods, instead of compare the target sequence with known proteins, compare fragments (small sequences of amino acids) from the target sequence with known structures from the Protein Data Bank. After a proper fragment is found, it is assembled into the structure using scoring functions and optimization algorithms. Taking into account that the scoring functions resemble energy functions and the fragment assembly through optimization algorithms resembles the free energy optimization, this kind of methodology is known as ab initio prediction based on database information, although they cannot be strictly considered as ab initio structure prediction methods based on energy minimization (Bonneau et al., 2001, Floudas, 2006). - 26 -.

(27) The research made by Bowie et al. (1991) gave the basis for the protein threading. It is based on the measure of the relevance of the conformation of amino acids belonging to a sequence depending on a specific environment defined in terms of secondary structure and solvent accessibility. The key components to do this approach are a good data base, an energy function and an algorithm that made a good alignment between the target sequence and those on the data base.. The prediction of loop structures and low homology sites is no possible by folding recognition, it is necessary that those structures be refined with energy minimization in software like Hyperchem.. The process of energy minimization is a geometric optimization problem; the methods that are usually used in the programs are the conjugate gradient method and the maximum slope method. These methods move gradually the coordinates of the atoms while the system is closer to the minimum energy point.. The maximum slope method is not efficient, but it is robust and have and easy implementation. In this method first the forces and potential energy are calculated and then the new position is calculated with equation 3. Where r is the vector of coordinates 3N, Fn is the force and hn is the maximum displacement.. Eq-3. The conjugate gradient method is slower but while this is progressing and get closer to the minimum energy point it becomes more efficient. The Equations 4 and 5 correspond to this method. The value of βk is give by eq. 6 and the convergence criterion is show on eq. 7. In that equation V is the potential energy and ri are the optimized coordinates. Eq-4. - 27 -.

(28) Eq-5 Eq-6 Eq-7. After performing the minimization of energy it will be necessary to verify if the obtained angles are possible and have no steric conflicts; this can be done using Ramachandran plots (Ramachandran et al., 1963),. in those each possible. conformation have an a defined area.. The different force fields used for molecular systems could be interpreted in terms of four components that calculate the intra and inter molecular forces in the system. In equation 8 the two first terms are de deviation of the bonds and angles from their reference values, the third term if a function that describe how the system energy is changing when the bond are rotated, the last term are the interactions of the no bond interactions.. Eq- 8. The different developed commercial force fields have different ways to calculate each term of the previous equation, there could be more component in order to improve the efficiency of the system. Some of the more used force fields are AMBER, CHARMM and GROMOS. AMBER (Assisted Model Building and Energy Refinement) force field was develop by Weiner (Weiner et al., 1984) and is show in equation 9. The first two terms are related to the flexibility of the bond and corresponds to approximations like oscillations and harmonics used in classic mechanics. There Kr and Kθ are the force constants of Hooke law, r is the instant distance which subscript eq means to the value at equilibrium and θ is the harmonics vibrations between the connected atoms. The n x t term is related with the torsion movements between dihedral bonds and corresponds to a truncated Fourier series, γ is the phase angle and. is the dihedral angle. The last two terms of the equations. are related with the long-range potentials, the first are Lennard-Jones relations of 12-. - 28 -.

(29) 6 type, those have into account Vander Waals forces and hydrogen bonds. Finally the last term is the electrostatic interactions between the atoms (Hinchliffe, 2008).. Eq- 9. The interaction between atoms are described by molecular mechanics and dynamics, the last one is described by the second law of Newton (Equation 10). Eq-10. There are different algorithms that permit to calculate the future positions and velocities of the molecules based in the information about momentum. To obtain the acceleration of the particles (equation 11) it is required to calculate the force of the interactions between molecules (force fields), then the acceleration is integrated and we obtain the velocity of each atom and their kinetic energy (equation 12). Eq-11. Eq-12. To do that process there are some well known algorithm like Verlet and Leapfrog; with the first of these it is possible to calculate the position of each particle for each instant, nevertheless the velocity have to be inferred. In the second algorithm the velocity can be calculated for each ½ + t instants, but the positions have to be inferred. Usually the Verlet algorithm is preferred (van der Spoel et al. 2010).. 4.7.2.2 Molecular docking fundamentals. The connections between proteins can be described through thermodynamics as an spontaneous change in free energy with factors like electrostatic and van der waals. - 29 -.

(30) interactions, conformational changes, entropy changes and solvatation. These are described in the equation 13.. Eq-13. To find the best conformation of a complex it is necessary to have an algorithm, an objective function and an optimization algorithm. The best way to do the search of that complex structure is assuming a rigid body. Because the big size of the system and the space of solutions, it is necessary to reduce it. One way to reduce calculations time is the Fast Fourier Transform (FFT) permitting us to use a easier search algorithm.. Another approximation is to discrete the proteins into grids and searching the translation in the real space, on each cell of the mesh can exist one or zero atoms (Xu et al., 2007). Then with another algorithm (usually stochastic, like Montecarlo simulations or genetic algorithms) different overlapping of the respective grids are tested, calculating the bond energy between the different cells of the grid. It is necessary to do that process several times to have different configuration of the complex. The final step is to choose the complex with the less bond energy, because it is the more stable. Programs like Autodock do the search using simulated annealing or genetic algorithms (Morris et al, 1998).. - 30 -.

(31) 5.. METHODOLOGY. 5.1. Oil palm empty fruit bunch pretreatment. The empty fruit bunch of African oil palm used on liquid assays was pretreated with Sulfuric acid (1 % v/v). 100 g of OPEFB were soaked on 2 liters of sulfuric acid and autoclaved at 121°C for 15 minutes. Then the OPEFB was washed with deionized water until the pH of the outlet water was close to 6.5. The wet OPEFB was dried at 45°C for at least 2 days (Umikalsom et al., 1997).. 5.2. Selection of solid medium for degradation recognition To perform the screening on solid media, concentrations of agar and carboxymethyl cellulose (CMC) were tested with a gram-positive bacterium with cellulolytic activity to improve the visualization of CMC degradation. The tested media were: salt minimal medium (MM; Kasana, 2008) (0.2% NaNO3, 0.1% K2HPO4, 0.05% MgSO4, 0.05% KCL, 0.02% Peptone, 12,5 µg/ml chloramphenicol), LB (Luria-Bertani medium), LB 50% and LB 10%, the variations of CMC: 0.05%, 0.2% or 0.5%, percentages of Agar: 0.8%, 1.2% or 1.7% and days of incubation: 5, 10 or 15.. 5.3. Screening on Liquid medium of the metagenomic library Two pools of the metagenomic library were grown in MM that had pretreated empty fruit bunch (1% w/v) or filter paper cut in small pieces (1% w/v) as carbon source. The pools were incubated at 30°C and 200 rpm for 35 days. At the end of this period in the medium remained those clones that degrade cellulose. In the final day, three aliquots of each culture were diluted (1:100) and spread on solid LB medium in order to recognize all the colonies that survived. After two days of growth the colonies were identified and labeled.. To recognize cellulose degradation, all the colonies were cultivated on the previously selected solid medium for recognition, the incubation was carried out at the best. - 31 -.

(32) tested conditions. All cultures on solid medium were revealed with Congo red technique. All colonies on petri dishes were washed and the plates were stained with Congo red (0.2%) for 1 hour. Then the dye was removed and the plates were washed with NaCl 1M for half an hour. (Ilmberger and Streit, 2010). This methodology is already reported on other studies for the detection of cellulases on solid media (Kasana, 2008; Ilmberger and Streit, 2010; Wang et al, 2009; Voget et al, 2006; Teather and Wood, 1982). 5.4. Selection of clones with oil palm empty fruit bunch degradation capacity In order to select two clones for further experiments, three assays were carried out: growth curve measurements and quantification of sugars on medium by colorimetric assays and high performance liquid chromatography (HPLC).. Growth curves on different media were evaluated for each positive colony. The four tested media were MM with CMC (0.2%), Filter paper cut (1%) or OPEFB (1%) as carbon source or LB 50% with 0.2% CMC.. With the purpose of performing sugar quantification on medium, the clones were grown on MM with addition of OPEFB (1% w/v) and after overnight incubation the samples were filtrated to eliminate the OPEFB.. The sugar quantification by colorimetric assays was performed by dinitrosalicylic acid (DNS) technique: 3 ml of DNS were added to 1,5 ml of the sample, it was heated on boil water for 5 minutes and then put in cool water until it reached room temperature. 0,5 ml of that mix were diluted in 2,5 of water to read absorbance at 540 nm. The concentration of sugar was determined by a standard curve. (Miller, 1959; Zhang, 2009).. The standard solutions for detection of sugars in media by HPLC were: pure CMC (2 mg/ml), pure Cellubiose (2 mg/ml), pure glucose (2 mg/ml) and mixed solutions of these (1.2 mg/ml, 0.8 mg/ml, 0.5 mg/ml, 0.25 mg/ml, 0.125 mg/ml).. - 32 -.

(33) 5.5. Effect of metal ion, pH and temperature over the cellulase activity In order to improve the degradation rate, the reaction of each extract with OPEFB was carried out at different conditions. The cellulases were collected after four hours of inoculation at the middle of the exponential growth phase. The samples were centrifuged for 30 minutes at 4500 rpm with the purpose of obtaining the cells, the pellet was diluted on Buffer pH 8 (Tris-HCl 50 mM, NaCl 100m, EDTA 1Mm, 0.15% Triton X-100). The disruption of cellular membrane was carried out by mechanical method using Beadbeater and then centrifugation for 10 minutes at 13000 rpm.. The reactions were carried out for 2 hours on volumes of 200uL: 100uL of buffer, 50 µL of protein extract (5mg/ml), 50µL of metal ion 40mM and 2,5% w/v of OPEFB. The tested range of temperature was between 10°C and 70°C, the pH range was from 1 to 10 and the tested metal ions were K, Mg, Cu and Zn. The reactions for variation of metal ion were carried out at 50°C and pH 8. The buffers used for this assays were: Buffer KCl/HCl (pH 1 and 2.5), Buffer McIllvaine (pH 4,5.5 and 7) and Buffer Tris-HCl (pH 8 and 10); all the reactions for pH curves were performed at 50°C. The variation of temperature reactions were carried out at the best pH and Ion Metal previously found for each colony extract.. After incubation, each sample was centrifuged for 1 minute at 13300 rpm in order to eliminate OPEFB. Phenol-Sulfuric acid assay (Zhang, 2009) was used for quantification of sugars: 50 µL of sample were mixed with 30 µL of Phenol 5% and 180uL of Sulfuric Acid 96% were added. After 5 minutes the absorbance was measured at 480nm. The concentration of sugar was determined by a standard curve and each assay was carried out by duplicate.. 5.6. Assays for kinetic model adjustment Standard Michaelis-Menten assays were performed for each extract at the determined conditions that improve the production of sugars. The substrate (OPEFB) concentrations were 5%, 6.25% and 7.5% (w/v) and the samples were collected every three minutes until 27 minutes. Quantification of reducing sugars was made by. - 33 -.

(34) phenol-sulfuric acid method. The results were adjusted to a Fractal Kinetic Model (Equation 1; Kopelman, 1988; Väljamäe et al., 2003).. To find the values of each constant the regression was performed by the method of least squares and corroborated by the open fitting curve toolbox (cftool) on MatLab ® (Matlab, 2010).. 5.7. Determination of molecular weight range by Ultrafiltration Concentration of the cellulases was performed by ultrafiltration with Macrosep ® Advance Centrifugal Devices with a threshold of 10kDa to 90kDa in order to separate the proteins by molecular weight. The protein was quantified by spectrophotometry at 280 nm with Nanodrop ND-1000. To verify the location of cellulases two hour reactions were carried out, measuring reducing sugars by phenol-sulfuric acid method and glucose-oxidase kit.. 5.8. Fosmids sequentiation and recognition of cellulase sequences The protocol to isolate the fosmids was the one proportionated by invitrogen on their PureLink® Quick Plasmid Miniprep Kit. The isolated DNA was sequenced on the Huck Institutes of Life Sciences at Pennsylvania State University by Ion Torrent (314 chip).. The reads were compared with Escherichia coli genome and fosmid sequence (EU140752.1), in order to eliminate contaminant sequences. The quality of the reads was checked with FastQC package (Andrews, w.d.) and also trimmed and filtered by quality using FASTX-ToolKit (Hannon, w.d.). The assembly of the reads was carried out with CLC Genomics Workbench software (CLC bio, w.d.) and Cap3 (Huang and Madan, 1999). To recognize cellulases sequences a Basic Local Alignment Search Tool (Blastx) (Altschul et al., 1997) was run from terminal. This tool translates nucleotide sequences to its 6-frames aminoacid sequences and compares them with the non-redundant protein sequences data base. The reads that align with cellulase. - 34 -.

(35) sequences of other organisms were selected for evaluation at Pfam platform (Punta et al., 2012) to determine the family at which each hypothetical cellulase belongs.. 5.9. Protein structure prediction of found cellulases. The 3D structures of the found cellulase sequences were modeled by homology using SWISS-MODEL® Workspace (Arnold et al., 2006). A geometric optimization was carried out in order to improve the structures and to find the most probable energy conformation of the proteins. This geometry optimization was performed with HYPERCHEM® (Froimowitz, 1993) in vacuo using the polak-ribiere (conjugenergetic stability and ated gradient) algorithm, and a termination condition of 0.1 kcal/(Å mol). The energetic state and spatial orientation of the aminoacids were measured with QMEAN (Benkert et al., 2011) and PROCHECK® (Laskowski et al., 1993), in order to determine the quality of the structure modeling.. 5.10 Molecular docking of found cellulases Molecular docking was performed with AutoDock 4.2 (Morris et al., 2009) using a rigid model for the macromolecule. Since the active site is not previously known the grid was set with a spacing of 0.375 Å and 126 points of evaluation in each direction, thus, the complete surface of the protein was evaluated. Molecular docking simulations were also performed with the crystalline structures of the proteins that were the templates for the homology modeling. Fifteen possible binding sites were proposed by AutoDock, those that showed lower binding energy were selected for further analysis.. In order to simulate the interaction of endoglucanases with amorphous cellulose a network of glucoses should be necessary as ligand, nevertheless this simulations required robustness. Thus, all the cellulases were proved with a polysaccharide of 5 glucoses in order to evaluate its exoglucanase capacity. The possible found βglucosidades were also proved with cellobiose.. - 35 -.

(36) 5.11. Comparison with other cellulase sequences Multiple alignment was carried out in order to corroborate the conservation of the aminoacids that showed interaction with the ligand . Ten cellulase sequences of other organisms were selected for comparison. The search for those sequences was performed by PSI- Blast (Altschul et al.. 1997) with three iterations. The alignment procedures were performed on MUSCLE (Edgar, 2004).. - 36 -.

(37) 6.. 6.1. RESULTS AND DISCUSSION. Library screening The variations on media, agar concentration, CMC concentration and days of incubation show that the best media to recognize degradation of CMC is LB 50%, Agar 1.2%, CMC 0.2% and 15 days of incubation (Figure 3).. Figure 3. Selected media stained with Congo Red, the yellow halos represent the areas were the CMC was degraded.. After the 35 days of incubation on minimal medium, 307 colonies were recognized by spreading aliquots on solid LB. These colonies were grown on the best solid medium for selection and revealed with Congo red Stain. Therefore, 13 colonies were selected as possible cellulase producers. Even though this methodology is a standard assay for recognition of cellulases, it was reported that the possibility to find positive cellulases in a metagenomic library is in most of the cases low (Ilmberger and Streit, 2010; Duan, 2010). Thus, the recognition of 13 colonies that showed cellulolytic degradation with congo red staining method could be considered a high rate of positive results.. - 37 -.

(38) Selection of clones with oil palm empty fruit bunch degradation capacity Growth curves on different media were done for the selected clones (Figure 4). In those curves it was corroborated the possible cellulolytic activity of the colonies. Growth curves in LB 50% + 0.2% CMC show that all the colonies had similar growth, indicating that the carbon source is a differential parameter in the growth of each colony. This result could indicate that those that have good development in media with cellulose as carbon source had a cellulase sequence insertion. Colony 8 shows higher growth than the other colonies in all media, including MM with OPEFB. Colonies 1, 2 and 6 were dismissed because those did not growth well enough at any media.. B. 1 0.8. Absorbance OD600. 1 0.8 0.6. 0.4. 1. 0.1 0. 5. C. Absorbance OD600. 0.2. 0.4. 0.2. 0.1 10. 15. 20. 25. 0. 5. 10. Time (h). 15. 20. 25. 15. 20. 25. Time (h). D. 1 0.8 0.6 0.1. 0.4. 0.6. Absorbance OD600. Absorbance OD600. A. Absorbance OD600. 6.2. 0. 5. 10. 1 0.8 0.6 0.4. 15. 20. 25. Time (h). 0.2. 0.2. 0.1. 0.1 0. 5. 1. 10 Time (h) 2. 3. 15. 4. 20. 5. 25. 6. 7. 0. 8. 5. 9. 10. 10 Time (h) 11. 12. 13. (-). Figure 4. Growth curves (semi-log) of the 13 colonies that showed cellulolytic activity on solid media. A. MM + 1% OPEFB. B. MM + 1% Filter paper. C. MM + 0.2% CMC. D. LB 50% + 0.2% CMC. - 38 -.

(39) The ten colonies that showed growth on the curves were cultured again on MM + 0.1% OPEFB, after 16 hours of incubation reducing sugars on media were measured by DNS assay. Five colonies showed significant differences (Figure 5) with the blank (minimal media without any culture), so, those five were selected for the next steps in the recognition of the best clones.. Colonies 3, 4, 8, 12 and 13 were cultured again on MM + OPEFB to measure sugars in media with HPLC, in order to get an idea of what kind of cellulases could be present on the colonies. The HPLC results showed that samples of colonies 4, 8, 12 and 13 had traces of cellubiose (Figure 6). (Find similar results, Supplemental material 1). The peak found at around 8 minutes in all samples, including the blank, could be a polymer of sugars characteristic of OPEFB. On the other hand the peak around 10.3 minutes represents cellubiose presence indicating that the samples had this compound and probably also exocellulase activity. It is important to take into account that these results do not necessarily indicate that there is not any other cellulase activity, because other polysaccharides could have been produced and also digested. There is also a peak on 15.3 minutes that could be recognized as xylose, indicating the possible presence of other kind of Glycoside-hydrolases. The colonies 4 and 8 were selected for further evaluation.. Reducing Sugars (µg/ml). 600 500 400 300 200 100 0 Blank. 3. 4. 5. 7 8 Colony. 9. 10. 11. 12. 13. Figure 5. Measurements of reducing sugars on media by DNS method. The ten colonies were selected from the growth curves. 7 colonies show degradation of OPEFB, but only 5 generate considerable amount of sugars.. - 39 -.

(40) Figure 6. HPLC result for clone 13. The peak at 10.374 minutes was recognized as Cellubiose and the peak at 15.297 could be xylose, one of the components of the hemicellulose. The results for the other three clones were similar. (Supplemental material 1) 6.3. Effect of metal ion, pH and temperature over the cellulase activity It was previously reported that metal ions could affect the activity of cellulases (Chang et al., 2012; Pei et al., 2012; Ramani et al., 2012; Jensen et al., 2011; Karnchanatat Et al., 2008), therefore reactions with 10mM of four different metal ions were proved (figure 7A). Colony 4 is the most affected by the ions. Excluding Zn +2, all metals had a positive effect over the quantity of sugars present after two hours of reaction. On the other hand colony 8 is positively affected by cupper and potassium ions but the final quantity of sugars is not very different from the reaction without any salt. Potassium was selected for further reactions of both colonies.. For the reactions at different conditions of pH and temperature (Figures 7B and 7C), colony 4 did not show production of reducing sugars without the metal ion, nevertheless the reactions with addition of KCl show a wide range of pH stability, ranging from 4 to 8.5, the highest peak is found at a pH of 4. The temperatures for the highest production of sugars for colony 4 are 50°C and 40°C, with and without. - 40 -.

(41) addition of potassium, respectively. The range of temperatures at which the degradation was detected was very similar in both conditions (30°C – 60°C).. Reducing sugars (µg/ml). A. 200. 150. MgCl2 CuSO4 ZnSO4 KCl (-). 100 50. 0 4. 8 Colony. Reducing Sugar (µg/ml). B. 300 250. 200 150 8 with metal ion. 100. 300 50 250. 8 without meta ion. 0 200 150. Reducing sugars (µg/ml). C 400. 100. 350. 50. 300. 1. 2.5. 4. 5.5 pH. 7. 8.5. 10 4 with metal ion. 0. 250. 1. 2.5. 4. 5.5. 7. 8.5. 4 without metal ion. 10. 200 150 100 50 0 10. 20. 30. 40. 50. 60. 70. Temperature. Figure 7. Effect of Metal ions (A), pH (B) and Temperature (C) on the degradation of OPEFB after two hours of reaction. Reducing sugars were measured by phenolsulfuric acid method.. For colony 8, the reactions assays showed degradation of the OPEFB at a range of basic pH. The highest quantities of reducing sugars were detected at 7 (with KCl) and 8.5 (without KCl). The temperatures at which degradation was detected range - 41 -.

(42) from 30°C to 60°C. The peaks of more sugar production were 40°C for reactions without metal ion and 30°C for reaction with KCl.. In all reactions at different conditions it was noticed that metal ion KCl affects positively the degradation performed by the proteins produced by the colony 4. Nonetheless cellulases produced by colony 8 are less effective at the presence of the ion. The majority of cellulases known react at the temperatures and pH ranges that were mentioned before (Sun and Cheng, 2002; Cocknurn and Clarke, 2011; Kim et al., 2008, Beloqui et al., 2010 ). However the temperatures at which the cellulases were found more efficient are at lower values than the range usually reported.. 6.4. Kinetic model adjusment. Michaelis–menten model was proposed as an initial approximation for the possible kinetics of degradation of OPEFB by the cellulases of the clones. Nevertheless the results (Figure 8) show that the production of reducing sugars is dependent of the time of reaction, this was an expected result since the products of the cellulose degradation, as cellobiose or cellulose, were previously reported to change the reaction kinetics at high concentrations (Sun and Cheng, 2002). There are other models that were previously found on kinetics of cellulases (Bansal, 2009), all include the influence of diffusion and adsorption of the cellulase to cellulose. Since the reaction is made on a two phase environment (solid-liquid) and cellulose is a heterogeneous substrate (Xu and Ding, 2006) the kinetic reaction could be adjusted to a fractal model (Equation 1, Table 2) (Kopelman, 1988; Väljamäe et al., 2003). Previous reports of fractal kinetics on cellulases (Wang and Feng, 2010; Yao et al., 2011) estimated that the fractal dimension for these reactions also depends on the pretreatment of the cellulose, nevertheless it is around 0.33. The effect of the metal ion presence is reflected on the value of the k constant, the lower values correspond to the reactions of the colony 4 extract without metal ion and the colony 8 extract with presence of ion. This tendency agrees with the previous results. The metal ion could be affecting the 3D structure of each protein or the affinity for the substrate. - 42 -.

(43) A. B. 160. 250. Reducing Sugars (µg/ml). Reducing Sugars (µg/ml). 140. 120 100 80. 60 40. 0. Reducing Sugars (µg/ml). 5. 10. 15 Time (min). 200. 150. 200 150. 100 50 0. 180 160 140. 120 100 20 80 60. 0. 25. 40 20 0 0. 10. 20 D Time (h)90. 5. 10. 5. 10. 15 Time (min). 20. 25. 20. 25. 30. 80 Reducing Sugars (µg/ml). 0. Reducing Sugars (µg/ml). 20. C. 300. 100. 50. 70 60 50. 40 30 20. 10 0. 0 0. 5. 10. 5% Fractal Kinetic Model. 15 Time (min) 5%. 20. 25. 6.25% Fractal kinetic model. 0. 6.25%. 15 Time (min). 7.5% Fractal kinetic model. 7.50%. Figure 8. Experimental results for kinetic assays of colony 4 without metal ion (A), colony 4 with addition of metal ion (B), colony 8 without metal ion (C) and colony with addition of metal ion (D). All the curves were adjusted to a Fractal kinetics model (Continuous lines). The percentages are the quantity of OPEFB on each reaction. Previous studies (Väljamäe et al., 2003) showed that the fractal dimension (h) changes with the variation of the initial concentration of the substrate, nevertheless this behavior depends on the kind of substrate and the cellulose itself. On the assays carried out with proteins from colony 4, regardless the presence of metal ion, and protein extract of colony 8 without addition of metal ion the values for this constant have an inverse proportional tendency with the increasing of the substrate quantity. The h value decrease represents a reduction in adsorption time or an increment in diffusion rate. Otherwise, the increasing trend of the fractal dimension on the reaction of protein extract of colony 8 with presence of ion metal means that the reaction is been more limited by the transport phenomena.. - 43 -.

(44) Table 2. Fractal dimension and kinetic constant for the reaction made with protein extract of colonies 4 and 8. The reaction were carried out with and without ion metal. Colony 4 8. 6.5. Metal ion. k. + +. 0.000183 0.000210 0.000240 0.000145. 5% 0.3443 0.1294 0.3259 0.3431. h 6.25% 0.3285 0.1268 0.2825 0.3515. 7.50% 0.3162 0.1213 0.2699 0.3875. 5% 0.81 0.93 0.87 0.94. R-square 6.25% 0.83 0.89 0.87 0.92. 7.50% 0.95 0.97 0.92 0.89. Determination of molecular weight range by Ultrafiltration Ultrafiltration by centrifugation improves the concentration of the proteins and divides the sample in two solutions: The solution below the membrane contains all proteins between 10 kDa to 90 kDa, the solution above the membrane will contain the proteins with a higher or lower molecular weight than the mentioned.. The producing sugars of each final solution are shown on table 3. For both clones the majority of cellulose degradation was detected at the reactions with the solution above the membrane, nonetheless the colony 4 extract below the membrane also produced glucose. This means that probably the cellulases produced by the colony 8 had a higher molecular weight than 90 kDa or lower than 10 kDa. Also, the colony 4 had cellulases of both molecular weight ranges, possibly a β-glucosidase with molecular weight between 10kDa and 90kDa is produced by this colony.. It was expected that the majority of cellulases were on the solution below the membrane as it was reported (Niranjane A, 2006; Wilson, 2011) that majority of cellulases are in a range between 20 – 100 kDa. Nonetheless, it is possible that the cellulases belong to a cellulosome structure as they are been expressed by bacteria. In that case, the molecular weight of the complex is much greater.. The tables 4 and 5 show the concentration and quantity of protein of each final solution. For both clones the solution concentration above the membrane is higher than the initial, as the production of sugar is mainly detected on the reactions made. - 44 -.

(45) with this solution it could be concluded that the proteins were effectively concentrated. Also the percentage of recovery is around 80%, which is greater that the usually obtained by other concentration methods. Table 3. Quantity of reducing sugars after 2 hours reaction with the solutions after ultrafiltration. Reducing sugars (µg/ml)above the Reducing sugars (µg/ml)below de membrane membrane Colony. Phenol -sulfuric Glucose-oxidase Phenol -sulfuric Glucose-oxidase Acid Method kit. Acid Method kit.. 4. 496.9880679. 159.6391616. 288.6649394. 151.9316147. 8. 853.8055161. 44.63047116. 294.5433159. 31.3311745. Table 4. Concentration of proteins in the final solutions after ultrafiltration.. [mg/ml] Before Centrifugation Above membrane Below Membrane. Concentrations 4 8 7.35 2.73 13.08 5.18 4.87 2.22. Table 5. Quantity of proteins in the final solutions after Ultrafiltration Quantities [mg] Before Centrifugation Above membrane Below membrane Sum of resulting solutions % of recovery. 6.6. 4 8 36.75 13.65 13.08 3.108 16.43625 7.881 29.51625 10.989 0.80316327 0.80505495. Fosmids sequentiation and recognition of cellulase sequences Each fosmid DNA extract generates 100.000 reads with a maximum length of 326 aminoacids. The sequences quality (Supplemental material 2) was checked in order to assembly only those reads with quality scores higher than 20 in the FastCQ application. For each colony approximately 70.000 reads were assembled and the. - 45 -.

(46) resultant 7.000 contigs were compared using Blastx. Among the sequences that align with cellulases, two contigs (791, 1847) from colony 4 and three contigs (2669, 3141, 6490) from colony 8 were selected for Pfam characterization (Table 6). Three Glycolys hydrolases families were found on the five contigs; β-glucosidases are part of family 1 and 3, Endoglucanases are part of family 8 and Exoglucanases are part of family 3. Therefore, colony 4 insertion could codify for endoglucanase and βglucosidase proteins, on the other hand colony 8 insertion could produce the three kinds of cellulases. The comparison of the contig sequences with the consensus sequences of the Hidden Markov Models (HMM) shows that contigs 271, 1847, 3141 and 6490 do not contain the complete sequence of the cellulase domains, besides the e-values are low enough to consider that the alignment was not made by randomness. Moreover the alignment predicts that the aminoacid 373 of contig 1847 could be in the active site of the protein.. Table 6. Identification of domains in each contig that had blast results related with cellulases. Glycosyl hydrolases Alignment Colony Contig ID Lenght family From To 4 791 271 8 2 257 4 1847 475 1 3 472 8 2669 370 8 3 348 8 6490 417 1 10 414 25 226 3 N terminal domain 8 3141 631 270 330 3 C-terminal domain 374 629. 6.7. HMM consensus sequence From 93 3 1 53 1 238 1. To 342 454 342 454 194 298 227. E- value. Predictive Active sites. 9.00E-79 8.90E-128 1.60E-115 1.20E-121 1.00E-51 1.20E-05 4.20E-69. 373 -. Structure modeling and molecular docking The resulting structures were modeled by homology based on cellulases produced by Escherichia coli K-12 substr. MG1655 and Pseudoalteromonas sp. BB1. The sequences of contigs 791, 1847 and 2669 had high percentages of similarity compared with the template structures, however the contigs 5490 and 3141 of colony 8 had low similarities (Table 7). Nonetheless, the e-values of these alignments were low indicating that the modeling based on those templates is. - 46 -.