DISCUSIÓN - FACULTAD DE CIENCIAS DE LA SALUD

Often the code in management style classes such as ZipReplace is quite generic and can be applied in many different ways. It is possible to use either composition or inheritance to help keep this code in one place, thus eliminating duplicate code.

Before we look at any examples of this, let's discuss a tiny bit of theory. Specifically:

why is duplicate code a bad thing?

There are several reasons, but they all boil down to readability and maintainability.

When we're writing a new piece of code that is similar to an earlier piece, the easiest thing to do is copy the old code and change whatever needs to change (variable names, logic, comments) to make it work in the new location. Alternatively, if we're writing new code that seems similar, but not identical to code elsewhere in the project, the easiest thing to do is write fresh code with similar behavior, rather than figure out how to extract the overlapping functionality.

But as soon as someone has to read and understand the code and they come across duplicate blocks, they are faced with a dilemma. Code that might have made sense suddenly has to be understood. How is one section different from the other? How are they the same? Under what conditions is one section called? When do we call the other? You might argue that you're the only one reading your code, but if you don't touch that code for eight months it will be as incomprehensible to you as to a fresh coder. When we're trying to read two similar pieces of code, we have to understand why they're different, as well as how they're different. This wastes the reader's time;

code should always be written to be readable first.

I once had to try to understand someone's code that had three identical copies of the same three hundred lines of very poorly written code. I had been working with the code for a month before I realized that the three "identical" versions were actually performing slightly different tax calculations. Some of the subtle differences were intentional, but there were also obvious areas where someone had updated a calculation in one function without updating the other two. The number of subtle, incomprehensible bugs in the code could not be counted.

Reading such duplicate code can be tiresome, but code maintenance is an even greater torment. As the preceding story suggests, keeping two similar pieces of code up to date can be a nightmare. We have to remember to update both sections whenever we update one of them, and we have to remember how the multiple sections differ so we can modify our changes when we are editing each of them. If we forget to update both sections, we will end up with extremely annoying bugs that usually manifest themselves as, "but I fixed that already, why is it still happening?"

The result is that people who are reading or maintaining our code have to spend astronomical amounts of time understanding and testing it compared to if we had written the code in a non-repetitive manner in the first place. It's even more frustrating when we are the ones doing the maintenance. The time we save by copy-pasting existing code is lost the very first time we have to maintain it. Code is both read and maintained many more times and much more often than it is written.

Comprehensible code should always be paramount.

This is why programmers, especially Python programmers (who tend to value elegant code more than average), follow what is known as the Don't Repeat

Yourself, or DRY principle. DRY code is maintainable code. My advice to beginning programmers is to never use the copy and paste feature of their editor. To

intermediate programmers, I suggest they think thrice before they hit Ctrl + C.

But what should we do instead of code duplication? The simplest solution is often to move the code into a function that accepts parameters to account for whatever sections are different. This isn't a terribly object-oriented solution, but it is frequently sufficient. For example, if we have two pieces of code that unzip a ZIP file into two different directories, we can easily write a function that accepts a parameter for the directory to which it should be unzipped instead. This may make the function itself slightly more difficult to read, but a good function name and docstring can easily make up for that, and any code that invokes the function will be easier to read.

That's certainly enough theory! The moral of the story is: always make the effort to refactor your code to be easier to read instead of writing bad code that is only easier to write.

In practice

Let's explore two ways we can reuse existing code. After writing our code to replace strings in a ZIP file full of text files, we are later contracted to scale all the images in a ZIP file to 640x480. Looks like we could use a very similar paradigm to what we used in ZipReplace. The first impulse, obviously, would be to save a copy of that file and change the find_replace method to scale_image or something similar. But, that's just not cool. What if someday we want to change the unzip and zip methods to also open TAR files? Or maybe we want to use a guaranteed unique directory name for temporary files. In either case, we'd have to change it in two different places!

We'll start by demonstrating an inheritance-based solution to this problem. First we'll modify our original ZipReplace class into a superclass for processing generic ZIP files:

import os import shutil import zipfile class ZipProcessor:

def __init__(self, zipname):

self.zipname = zipname

self.temp_directory = "unzipped-{}".format(

zipname[:-4])

def _full_filename(self, filename):

return os.path.join(self.temp_directory, filename) def process_zip(self):

self.unzip_files() self.process_files() self.zip_files() def unzip_files(self):

os.mkdir(self.temp_directory)

for filename in os.listdir(self.temp_directory):

file.write(self._full_filename(

filename), filename) shutil.rmtree(self.temp_directory)

We changed the filename property to zipfile to avoid confusion with the

filename local variables inside the various methods. This helps make the code more readable even though it isn't actually a change in design. We also dropped the two parameters to __init__ (search_string and replace_string) that were specific to ZipReplace. Then we renamed the zip_find_replace method to process_zip and made it call an (as yet undefined) process_files method instead of find_replace; these name changes help demonstrate the more generalized nature of our new class.

Notice that we have removed the find_replace method altogether; that code is specific to ZipReplace and has no business here.

This new ZipProcessor class doesn't actually define a process_files method; so if we ran it directly, it would raise an exception. Since it actually isn't meant to be run directly, we also removed the main call at the bottom of the original script.

Now, before we move on to our image processing app, let's fix up our original zipsearch to make use of this parent class:

from zip_processor import ZipProcessor import sys

import os

class ZipReplace(ZipProcessor):

def __init__(self, filename, search_string, replace_string):

super().__init__(filename)

self.search_string = search_string self.replace_string = replace_string def process_files(self):

'''perform a search and replace on all files in the temporary directory'''

for filename in os.listdir(self.temp_directory):

with open(self._full_filename(filename)) as file:

contents = file.read() contents = contents.replace(

self.search_string, self.replace_string) with open(

self._full_filename(filename), "w") as file:

file.write(contents) if __name__ == "__main__":

ZipReplace(*sys.argv[1:4]).process_zip()

This code is a bit shorter than the original version, since it inherits its zIP processing abilities from the parent class. We first import the base class we just wrote and make ZipReplace extend that class. Then we use super() to initialize the parent class.

The find_replace method is still here, but we renamed it to process_files so the parent class can call it. Because this name isn't as descriptive as the old one, we added a docstring to describe what it is doing.

Now, that was quite a bit of work, considering that all we have now is a program that is functionally no different from the one we started with! But having done that work, it is now much easier for us to write other classes that operate on files in a zIP archive, such as our photo scaler. Further, if we ever want to improve the zip functionality, we can do it for all classes by changing only the one ZipProcessor base class. Maintenance will be much more effective.

See how simple it is, now to create a photo scaling class that takes advantage of the ZipProcessor functionality. (Note: this class requires the third-party pygame library to be installed. You can download it from http://www.pygame.org/.)

from zip_processor import ZipProcessor import os

import sys

from pygame import image

from pygame.transform import scale class ScaleZip(ZipProcessor):

def process_files(self):

'''Scale each image in the directory to 640x480''' for filename in os.listdir(self.temp_directory):

im = image.load(self._full_filename(filename)) scaled = scale(im, (640,480))

image.save(scaled, self._full_filename(filename)) if __name__ == "__main__":

ScaleZip(*sys.argv[1:4]).process_zip()

All that work we did earlier paid off! Look how simple this class is! All we do is open each file (assuming that it is an image; it will unceremoniously crash if the file cannot be opened), scale it, and save it back. The ZipProcessor takes care of the zipping and unzipping without any extra work on our part.

In document FACULTAD DE CIENCIAS DE LA SALUD (página 29-48)