The shape analysis described in Section 3.5.2 is batch analysis in nature, which often ignores the local complexity of individual particles. Batch analysis becomes problematic when analyzing complicated molecules such as the protein-DNA complex, where the complex needs to be analyzed in parts, and manual inspections are often required for proper tracing, cropping, measuring, and classification of the region of interests. None of the existing AFM software is designed to process particles of this complexity, so establishing a workflow using these programs is very difficult and is highly inefficient. To tackle this problem, Image Metrics introduces a specific module called the Region Inspector (Figure 3.2) to facilitate high-throughput single- molecule analysis such as the analysis of protein-DNA complex. Unlike shape analysis, which allows users to conduct batch analysis in particle filtering, alignment, correlation averaging, and ultimately, classification in a single package, single-molecule analysis allows users to frame complex particles such as protein-DNA complexes into regions of interest and precisely measure their conformations in a sequential manner. Whereas shape analysis excels in throughput and big data, region analysisexcels in measurement precision. The two modules complement each other and are tightly integrated in Image Metrics. Together, they allow users to conduct feature analysis both in precision and in scale.
Here, the workflow of single-molecule protein-DNA analysis in Image Metrics is
described. In single-molecule AFM studies of protein-DNA complexes, the following aspects are typically analyzed: (1) Specificity (defined as the relative binding affinity for the specific site versus a non-specific site): by measuring the position of the protein complex along the DNA [150]; (2) Stoichiometry and binding affinity: by measuring the number of proteins in each complex, the number of complexes on each DNA, and the free protein and DNA molecules [84,
150]; (3) Conformation: by evaluating the structure of the complex and the DNA bending at the complex [5, 151]. The specificity is the relative affinity of a protein on a specific sequence versus any other sequence on the DNA [150]; the stoichiometry is the number of proteins bound to DNA and/or within a protein complex; the binding affinity is a measure of how tight the protein binds to the DNA; and the conformation yields information about the structure of the protein-DNA complexes and how it relates to functions. The combined information above makes AFM a very powerful tool in dissecting how biology works at the molecular level, but it also presents a significant challenge due to the complexity and quantity of the analysis.
A. Identifying the Protein-DNA Complex
First, we need to isolate the protein-DNA complexes from the image, this process usually occurs after the image is processed in Image Processor (Section 3.4). Typically, features such as protein-DNA complexes (called particles in AFM software terminology) can be cropped
manually or automatically. In the automated procedure, particles are masked and detected as described earlier (Section 3.5.1). Their particle metrics (Table 3.3) are then analyzed based on the masked region of interest, and features of interest can be further located by finely filtering their particle metrics. Like many AFM programs, masked particles can be filtered by their particle metrics after they are analyzed (post-analysis filtering). In Image Metrics (and in SPIP), users can also filter the particles before the analysis (pre-analysis filtering) to save computation time on irrelevant features (see also Section 3.5.1). Users can use any combination of metrics in Image Metrics to pre-filter particles of interest (Figure 3.12D, Figure 3.18A) - the most
common metrics are their size (area, height, and volume), their fiber length, and whether they are bordering the image edges [91, 96, 152].
Due to the threshold applied, image artifacts, the feature being too low itself, or
inadequate filtering parameters, sometimes the masks can split up a sample feature, bridge two sample features, or incorrectly mask features that users do not want to analyze. In these
scenarios, Image Metrics (and AFM programs such as Asylum Research software and SPIP) allows users to manually add, remove, split-up, and group features. Users can also automatically group features using morphologic open operation (erosion followed by dilation) so separated features that are perceivably linked can be re-united. More details on particle detection technique are discussed in Section 3.5.1.
After features (particles) are identified, they then can be cropped (Figure 3.18A – blue box) and zoomed in for further inspection (Figure 3.18B). In other AFM software, this process is performed mostly manually. For example, in SPIP, you can use the inspection box feature to crop a region of the image, which opens a new window showing the zoomed-in area of the image [96]. In Asylum Research software, you can pop up an inspection window of the particle itself, but you cannot adjust the window size (in case the view of a particle is cropped), nor can you easily navigate among the particles. In Image Metrics, however, the process is mostly automatic. Individual particles are blown up and inspected within the Region Inspector module (Figure 3.2) in a sequential manner, where measurements of particles (specificity, stoichiometry, etc.) take place. The next particle will automatically pop up into view once the measurement on the current particle is finished. The separate inspection module allows the rest of an image be removed from distraction and only features of interest be presented to the end users.
B. Specificity
The specificity of protein binding to the DNA can be estimated by measuring the position distribution of proteins along the DNA (Figure 3.18E) [150]. To measure the position of a
protein complex along the DNA, the DNA molecule is traced and profiled (Figure 3.18B, Figure 3.17). The tracing of DNA molecule is a type of fiber analysis (Table 3.1 Fiber
Analysis), which is found in many special software packages39 measuring neurites (e.g. NeuronJ [94]) as well as DNAs (e.g. DNA Trace [104], FiberApp [105]). In most AFM-specific software packages (e.g. Nanoscope Analysis and Gwyddion), the standard profile analysis (also known as
section analysis) does not allow a freehand tracing option, which makes them unsuitable for analyzing DNA molecules. Other software offers a freehand tracking option (e.g. Asylum Research software and NeuronJ). However, the freehand tracing process is performed manually by users tracing and locating individual molecules, making it time consuming, and its accuracy could suffer from users’ subjective choices40. Automatic and semi-automatic tracing options [152, 153], however, do not have this problem. In Image Metrics, automatic and semi-automatic tracing are achieved by morphological transforms41 and geodesic distance transformations42 to trace and profile untangled linear DNA molecules [152, 154]. A step-by-step procedure of this operation is shown in Figure 3.17. A similar automatic tracing feature is also found in SPIP [96] and Asylum Research software43. After the DNA fiber is traced, the measurement of DNA fiber
39 These software packages are either plugins for ImageJ (such as NeuronJ) or MATLAB-based application (DNA Trace and FiberApp). A list of fiber analysis software (including ones not listed in the text) can be found in [105]. 40 In my own testing (data not shown), depending on the orientations of the molecule and how the freehand trace is digitized, fiber length obtained from freehand tracing can be systematically longer or shorter than automatic tracing methods.
41 The specific morphological transformation used is the skeletonize or thinning transformation. See also Section 3.5.1. This transformation is required for semi-automatic tracing.
42 This distance transform, achieved by the bwdistgeodesic function, allows the tracing of the longest, non-crossover distance between any two fiber points, which are essentially the two farthest distal points (ends) of the fiber. The transform allows measuring the shortest distance between the two fiber ends by eliminates distances from branched fiber detours. This transformation, in conjunction with the morphological thinning transformation, is required for fully automatic tracing.
length can be carried out as described previously [153, 155]. In Image Metrics, the Euclidian distance is used to estimate the length44. In addition to automatically tracing the DNA, the proteins’ positions, represented by peaks on the DNA profile, can also be located and recorded automatically (Figure 3.18B inset), which is not possible in other AFM programs. The program can also record short arm distance (the position of the protein to the nearest end of DNA).
However, the automatic procedure could fail to properly trace the DNA when the DNA molecule could not be masked properly or the DNA molecule is closed, branched, or tangled with other DNA strands. To resolve these more complicated scenarios, users can either: (1) tailor, split, or bridge the masks (Section 3.5.1), or (2) use region of interest tools (Figure 3.18B, blue crops), or (3) use manual freehand or semi-automatic options to assist tracing the DNA.
Figure 3.17 DNA Tracing and Fiber Analysis
Showing in the figure is the automatic fiber tracing feature in Image Metrics. (top-left) A DNA image is shown. (top-
middle) DNA is masked (orange) by a height threshold, and the mask is morphologically transformed into skeleton
(green). (top-right) The fiber (black) is determined as the longest path between any two points inside the skeleton without looping, which is the shortest path between the two ends of the skeleton. The location of the protein (cross
mark) is determined as the location of the peak in the height profile of the fiber (bottom figure). The start and end anchor point of the DNA fiber used in the height profile is marked by the square (start point) and the circle (end point). (bottom) The height profile of the fiber from top-right figure. The location of the protein is marked by a cross
symbol. C.Stoichiometry and Binding Affinity
Several stoichiometry metrics can be estimated – including how many protein complexes per DNA, how many proteins per complex, and ultimately how many proteins per DNA. To extract how many proteins per complex, we use volume analysis [84, 91, 150, 156]. In AFM terminology, volume describes the summation of pixel intensities within a mask, and it is usually proportional to the size (area and intensity) of the masked feature. Like many AFM software packages (e.g. Asylum Research software, SPIP, and Gwyddion), Image Metrics supports volume measurement as a particle metric (Table 3.3). Particle volumes can be used to extract stoichiometry information because the volume of a protein complex is roughly proportional to the number of proteins inside the complex [84]. To measure the volume of a protein complex on the DNA, a typical workflow includes the following steps: (1) The protein complex is first masked, and then separated from the DNA using region of interest (ROI) tools (Figure 3.18B, yellow crops). In Image Metrics, a freehand drawing tool is provided to draw ROIs or reverse ROIs. Similar ROI tools are also available in SPIP. (2) The volume of the protein complex is measured within the ROI, and its distribution is plotted (Figure 3.18C). The number of protein complexes on the DNA is also counted. In Image Metrics, this operation is performed
automatically as their volumes are being measured. (3) If the protein is known as monomer dominant, the first peak of the volume distribution can be used to identify the monomer state of the protein complex45. Users can also identify the stoichiometry of the peak by comparing the
45 The assumption here is that the protein will have a notable monomer state population that presents itself as a peak in the volume distribution, which may not be the case. Therefore, it is recommended to perform verification using other methods like dynamic light scattering (DLS), analytical ultracentrifuge, or mass spectrometry.
volume of free protein to that of DNA-bound protein. (4) Finally, the number of proteins per complex and per DNA can be calculated by normalizing the volume of the complex(es) to that of a single protein (Figure 3.18D).
It should be noted that the volumes of protein-DNA complexes can vary greatly because the conformation of the complex can affect its volume measurement46, so the number of proteins per complex estimated through normalization from step (4) above may not be accurate. But since the volumes of larger complexes usually do not overlap with those of smaller complexes, the peak with a higher volume roughly corresponds to a complex with an increased number of proteins. In addition, because volume measurement depends directly on the height and area measurements, any factors that affect those measurements also affect the volume measurement. For example, the height may be higher or lower if the surface is not normalized properly at the local level or if tip-sample interactions change (such as tip degradation or contamination). In that case, users can try to flatten the region locally (instead of flattening the whole image), calculate and offset the surface height from the height measurement of the feature, or normalize the volume distribution to help offset the difference. The masked area may also be larger or smaller than desired depending on the height threshold used for masking47, and users may have to re- adjust the threshold as they take the volume measurement.
In addition to counting protein-DNA complexes, free proteins and DNAs can also be counted (Figure 3.18A), and binding affinity can be estimated as described previously [150]. In Image Metrics, free proteins and DNAs can be filtered and counted through particle analysis
46 For instance, a protein sitting tall will likely have larger volume than the same protein lying flat on the surface because of the tip-dilation effect.
47 For example, if the surface is not flat, the height threshold may over-mask some particles while under-masking other particles, resulting in larger or smaller masked areas than desired.
(Section 3.5.2A). A module called Particle Counter can also be used to count particles manually, where the users mark the particles by using the mouse cursor. Great caution must be taken on validating the binding affinity, however, because one can over-estimate the binding affinity because of deposition artifacts such as random landing48 and/or local binding49 events, or under- estimate the binding affinity because of differential binding preferences for different types of biomolecules to the surface.
D. Conformation
Perhaps the most outstanding strength of AFM in single-molecule protein-DNA studies is that we can directly visualize the conformation of protein-DNA complexes under physiological conditions with relative ease. We can either qualitatively describe the conformation of a
complex, such as whether a complex loops the DNA, or how a complex sterically binds to the DNA; or quantitatively describe the conformation using particle metrics (Table 3.3) and/or DNA bend angles at the complex [91, 156, 157]. For instance, we can measure both the external bend angles between the two DNA arms extending out of the protein complex (Figure 3.18B, Figure 3.18F), and the internal bend angles embedded within the protein complex as visualized via DREEM (CHAPTER 2). Both bend angle metrics reflect internal conformations of a protein- DNA complex [154, 157]. Combined with biochemical functional studies, these conformations can then be correlated to their biological functions based on where and when they occur.
Image Metrics provides users with powerful tools to visualize and measure the conformation. For example, unlike other AFM programs (with the exception of SPIP) where users have to resort to third-party screen protractor tools, an angle measurement tool is built
48 Random landing – protein can randomly land on the DNA
directly into Image Metrics to provide interactive angle measurement, and the measurement result can be recorded directly into a database (Section 3.5.3E). Another example is profile analysis (also discussed in Section 3.5.3B), which is often used to dissect and inspect the topography along a certain path [98, 108]. Profile analysis is most powerful when used in conjunction with visualization techniques (3D, contour, and image overlay, see Section 3.3C), especially when users want to quantify the conformation of the DNA and the protein along a 2D slice, such as their locations relative to the surface and/or to each other (Figure 3.18I).
Compared to other AFM software, profile analysis in Image Metrics is more flexible, more interactive, and more comprehensive. For instance, while many software packages offer only line profile tool (e.g. Nanoscope Analysis, Gwyddion), Image Metrics offers both line profile and freehand profile tool (Figure 3.18H). Users can draw multiple lines on the same image (Figure 3.18H), plot multiple profiles on the same graph (Figure 3.18I, graph), mark multiple locations to measure (Figure 3.18I, vertical lines in graph), and take measurements on the coordinates and relative distances between markers (Figure 3.18I, table). Unlike other software where distance measurement is taken between markers within the same profile, Image Metrics allows the measurement to be taken across different profiles. Also unique to Image Metrics, users can interactively change the lines’ positions, sizes, and colors after they are drawn (Figure 3.18H), and their profiles will be updated automatically. In addition, color tagged markers are placed on both the image and the graph for easy tracking of feature spots. A data channel slider is also implemented for users to visualize and measure the line profiles in different data channels from the same lines, which could be very useful if data channels are composed of different types of information (e.g. phase and amplitude) or are composed of images scanned from a time series. For example, a novel use of this feature is to track the movement of a DNA molecule by tracking
the profile along a user-defined path (Figure 3.9A-B). The profile tool allows users to take snapshots of the profiles along different time stamps of the images (after the time unit is calibrated, see Section 3.3A) and measure the displacement and velocity of the movement. Finally, Image Metrics provides powerful morphologic transformations (Section 3.5.1) not seen in other AFM software, which could open up additional conformational metrics that can be measured (Figure 3.18G). One of the transformations, skeleton, is used to trace the DNA fiber as described earlier (Section 3.5.3B).
Figure 3.18 Single-Molecule Analysis
A. DNA molecules are masked, filtered, labeled, categorized, counted, and traced. The colors represent different
configurations of DNA molecules – Purple – free DNA with no protein bound; Dark red – tangled DNA; Light blue – DNA trapped by a bunch of proteins; Red – DNA longer than 450nm; Green – DNA shorter than 450nm. B. Inspection of one DNA molecule. DNA is traced automatically and profiled (top inset). Proteins positions on the DNA are identified, labeled, marked, and recorded both on the profile and the image. DNA bend angle is measured
on the first protein. Region of interest (ROI) – blue ROI blocks certain area (from being analyzed and traced); yellow ROI crops areas to be analyzed. Bottom inset – topographic visualization of the DNA molecule with phase overlay. C. Volume distribution of protein-DNA complexes (e.g. yellow crops in B) as done by volume analysis. D.
Stoichiometry of protein to DNA as calculated from counting and volume normalization of the complexes as in B and C. E. Position distribution of protein-DNA complexes as plotted from position measurement (B – top inset). F. DNA bend angle distribution as plotted from angle measurement in B. G. Morphology transformation of masks. In
this example, an ‘open’ morphologic transformation is used so that the transformed masks (green) traces the contour of the original masks (red). This transformation can be useful for contour measurement. H. Section analysis. Line section (olive), freehand section (green), distance section (blue) are shown. I. Profiles of the sections
(top), the lengths of the sections (top inset), and measurement of coordinates and distances (bottom table) between markers (vertical lines).
E. Data Management and High-throughput Analysis
Few AFM programs have built-in data management capabilities. Measurement data are usually exported to be processed by external statistical analysis and graphing programs. Some AFM programs (Table 3.1) are built into numerical computing platforms such as Igor Pro (Asylum Research software) and MATLAB (e.g. FiberApp and SFMetrics), and therefore allow
for direct data manipulation, analyses, and graphing using the capabilities provided by their