Encuadre ( setting ) - El autoconcepto en la infancia Propuesta de intervención

5. Propuesta de intervención

5.2. El autoconcepto en la infancia Propuesta de intervención

5.2.2. Encuadre ( setting )

still satisfying the rest of the regular expression. Let’s assume we have the following log entry:

Apr 17 08:22:27 rmarty kernel: Output IN= OUT=vmnet8 SRC=192.168.170.1

DST=192.168.170.255 LEN=258 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=138 DPT=138 LEN=238

We want to extract the output interface from this log entry. The first attempt would probably be something like this:

perl -pe 's/.*(OUT=.*) .*/\1/'

More or less surprisingly, this does not give us the right output, but instead it extracts all of the following:

OUT=vmnet8 SRC=192.168.170.1 DST=192.168.170.255 LEN=258 TOS=0x00 PREC=0x00 TTL=64

The correct way of extracting the interface is to use a nongreedy match for the interface name:

perl -pe 's/.*(OUT=.*?) .*/\1/'

This command correctly extracts the interface:OUT=vmnet8. So, be careful to avoid unwanted greedy matches!

UNIX

TOOLS

UNIX, or Linux for that matter, comes with a set of very powerful and usually fairly simple-to-use tools that are of great assistance when dealing with log processing. The tools I am using here are also available for Microsoft Windows. One commonly used package for Windows is cygwin.13_{Another one is UNIX Utils,}14_{which provides tools such as}

grep,sed, and awk. Let’s take a look at each of these three tools separately and see how

they can help us with processing log files.

13 _{http://cygwin.org}

grep

Probably the simplest UNIX utility when it comes to data processing is grep. When used on a file, it extracts the entries that match the search condition given as an argument:

grep "connection:" logfile.log

The preceding command will look through the file logfile.logand extract all the lines that mention the connection: string. All the other lines will not be shown. The inverse can

be done by using the following command:

grep -v "connection:" logfile.log

This command excludes all the lines that do not contain the string from the log file. Instead of just using simple strings, you can use regular expressions to extract lines that match a more complex pattern.grep is generally used to filter entire log entries and does

not modify individual log entries. However, you could use the –oparameter to extract portions of a log entry, which sometimes proves very helpful.

awk

If you need to parse log entries or reformat them,awkis one candidate tool. Whenever a log file has a simple structure—or in other words, a specific separator can be used to break a log entry into multiple chunks—awkis the tool you want to use. What do I mean by that? Assume you have the following types of log entries:

May 9 13:40:26 ram-laptop tcpspy[5198]: connect: proc (unknown), user privoxy, local 192.168.81.135:52337, remote 81.92.97.244:www

One way to extract specific fields is to use a full-blown parser or a regular expression. However, we can very simply use awk to extract, for example, all the users from the log entry:

awk '{print $10}' logfile.log

This simple command prints the tenth token. By default,awkuses any number of white spaces (spaces or tabs) as separators for the tokens. You can use the dollar notation to reference specific fields to operate on. The first token is denoted by $1. The next one is

$2, and so on.

Another sample use for awkis if you have a CSV file and you want to rearrange its columns. The first column should really be swapped with the second one. What you need to do is first change the field separator to be a comma rather than white spaces. You can do this with the -Fparameter. Here is the command to swap the first column with

the second one:

awk -F, '{printf"%s,%s,%s\n",$2,$1,$3}' logfile.log

The command assumes that you have a CSV file with three columns. You get the idea. A slightly different way of doing the same thing is the following:

awk –F, -v OFS=, '{tmp=$1; $1=$2; $2=tmp; print}'

This changes the output field separator (OFS) to a comma, and then simply switches the

first and second column around before printing the record again. There are various other uses for awk, but these are the two you will most likely be using.

sed

Another tool that I use fairly often is sed. Whenever a log entry does not have a simple structure and I cannot identify a simple separator such that I could use awk, I use sed.

With sed, I can specify a regular expression to extract the pieces of the log entry that I am interested in:

sed -e 's/.*Request: //' -e 's#[/:].*##' logfile.log

This is a fairly complex example already. But let’s break it down into pieces and see what it does. There are two separate commands that both start with the -eparameter.

Following the parameter, you find a regular expression. The first one just removes the entire beginning of the line up to and including the part of the log entry where it says

Request:. In short, it gets rid off the header part. The second command looks somewhat

more complicated, but it really is not. What you have to know here is that the first character following the scommand is used as the separator for the substitute command. In the first command, I was using a forward slash. In this example, I use a hash sign (#).

Why? Because if I use a forward slash, I have to escape all the forward slashes in the following regular expression, and I want to avoid that. The regex of the second command identifies instances where there is a slash or a colon followed by any number of characters, and it will remove all of that. Here is the original log entry followed by the output after the sedcommand was applied:

May 09 19:33:10 Privoxy(b45d6b90) Request: sb.google.com/safebrowsing sb.google.com

In essence, the command extracts domain names from proxy logs.

P

ERL

Instead of using UNIX tools, you could also use a scripting language, such as Perl. I use Perl for a lot of things, such as augmenting log entries with geographic information or doing DNS lookups. You should be aware of three simple cases when using command- line Perl to process log entries. The first case is essentially a grepalternative:

perl -ne 'print if (/^1/)' file.txt

This command looks for all the entries in the file that start with a 1. The next use is a substitute for sed:

perl -pe 's/^ +//' phonelist.txt

This command eliminates all the spaces at the beginning of a line. And finally, another command where you could use sedto extract individual parts of a log entry looks like this:

perl -pe 's/.*(IN=\S+).*/\1/'

This command extracts all the inbound interfaces in an IP tables log file.

Two somewhat more complex examples of useful Perl scripts are adding geographic locations for IP addresses and the reverse lookup of IP addresses in DNS to use the hostname rather than the IP address for graphing purposes. Both of these data mapping examples can be implemented with simple Perl one-liners.

15 _{http://search.cpan.org/~gmpassos/Geo-IPfree-0.2/}

Let’s assume we have the following log file, already formatted in a comma-separated form:

10/13/2005 20:25:54.032145,62.245.243.139,195.131.61.44,2071,135

We want to get the country of the source address (the first IP address in the log file) and add that at the end of the log file:

cat log.csv | perl -M'Geo::IPfree' -lnaF/,/ -e

'($country,$country_name)=LookUp($F[1]); print "$_,$country_name "'

This one line produces the following output:

10/13/2005 20:25:54.032145,62.245.243.139,195.131.61.44,2071,135,Europe

Let’s take a closer look at that command. Most important, I am using a Perl module to do the country lookup. The library you need is Geo::IPfree, which you can find on CPAN.15_{To use a library in Perl, you have to use the}_-M_{option to load it. Following that,}

I am using an -nswitch to create a loop through the input lines. Every line that is passed

to Perl will be passed through the command specified after the -eswitch. The other two parameters,-aF/,/, instruct Perl to split each input line, using a comma as a delimiter.

The result of the split is assigned to the array @F. The command itself uses the Geo library’s Lookup()function with the second column of the input as an argument, which

happens to be the source IP address from our input logs. Using chomp, I am stripping new line characters from the original input line so that we can add the $country_nameat

the end of the line.

In a similar fashion, we can either add or substitute IP addresses with their corre- sponding hostnames. The command for that looks similar to the Geo lookup:

cat log.csv | perl -M'Socket' -naF/,/ -e

'$F[1]=gethostbyaddr(inet_aton($F[1]),AF_INET)||$F[1]; $,=","; print @F'

This time, I am substituting the second column with the hostname of the IP address, instead of appending the hostname to the end of the line. The function to look up the

DNS host name for an IP address is called gethostbyaddr, and it is provided by the Socket library. I am using a couple of interesting things in this command. The first one is that I am resolving the hostname, and I make sure that if it is empty, I am printing the original IP address, rather than an empty string. That way, if an IP cannot be resolved, it will just stay in the log entry. The other thing I am using is $, which lets me change the output separator. If I were not setting the output separator to be a comma, the following print command would print all the elements from the array one after the other, without any separating character. However, I would like to have a comma-separated file again, so setting the separator to a comma will do the trick.

P

ARSERS

If you check Webster for a definition ofparser,the following will come up:

A computer program that breaks down text into recognized strings of characters for further analysis.

In our environment, this translates to taking a complex log entry and reducing it to smaller, easier-to-process components. For example, a firewall log file can be parsed into its components: source address, destination address, source port, rule number, and so on. We have used this idea throughout this chapter to extract pieces from the log files that we wanted to graph. The proxy example used simple UNIX commands to do the parsing. This is not necessarily the most efficient way to go about getting to the point of log visualization. It is much easier to reuse a parser that someone else wrote. This is espe- cially true if an application uses various different types of log entries. You can find a lot of parsers on the Internet. One of the best places is secviz.org under the topic of “parser exchange.”

As an example, let’s look at the Sendmail parser. Sendmail logs are annoying if you want to graph email communications. The problem is that Sendmail logs two separate entries for every email, one for the sender and one for the recipient. Here is an example:

Jul 24 21:01:16 rmarty sendmail[17072]: j6P41Gqt017072:

from=<[email protected]>, size=650, class=0, nrcpts=1,

Jul 24 21:01:16 rmarty sendmail[17073]: j6P41Gqt017072: to[email protected],

ctladdr=<[email protected]> (0/0), delay=00:00:00, xdelay=00:00:00, mailer=local, pri=30881, dsn=2.0.0, stat=Sent

As you can see, the first entry shows the sender piece of the email, and the second one shows the recipient information. To generate a graph from these log entries, we have to

merge the two entries. Two entries in the log belong together if they have the same mes- sage ID (j6P41Gqt017072). Instead of building some complex way of parsing these

messages and doing the match up, we can just reuse the Sendmail parser from secviz.org16_{as follows:}

cat /var/log/maillog | sendmail_parser.pl "sender recipient"

The output from this command is a CSV file that consists of sender-recipient pairs. It is now fairly simple to graph this information.

If you are in the fortunate position of operating a security information management (SIM) tool in your environment, you can probably make use of the parsing from those tools to generate parsed output. These tools rely heavily on parsing and generally come with a fair amount of device support. The only capability you need from the SIM is to export events into a simple format (CSV, for example).

O

THER

T

OOLS

Doing log analysis, I find myself writing a lot of little tools to make my life easier. I am trying to share my work so that people do not have to reinvent the wheel and can help me improve my tools. You can find all of my tools as part of the AfterGlow project at http://afterglow.sourceforge.net. If you download a recent version, the tools are located under src/perl/loganalysis, as well as src/perl/parsers. Have fun browsing through them.

S

UMMARY

This chapter has taken us through some more of the background we need to tackle our visualization projects. I introduced an information visualization process that consists of six steps. The process starts out by defining the problem that needs to be solved. To solve the problem identified, certain data sources need to be available. In some cases, addi- tional data must be collected. The next step is to process the information and filter it down to the necessary data. Using visual transformations, the data is then mapped into a graph. Various decisions need to be made at this point. Those decisions include not just what type of graph to use but also how to utilize color, shape, and size to best communi- cate the data properties. In the next step, view transformations can be applied to do a

final tweak of what exact part of the data the graph should show. This generally incorpo- rates the process of aggregating certain groups of values into one single cluster. And finally, the graph needs to be interpreted and vetted against the original objective.

The remainder of the chapter showed various simple tricks concerned with how to work with log files. Two of the more interesting applications were doing DNS lookups on the command line and doing IP address location lookups.

With all this information and a new process in our tool box, we can now proceed to look at the topic ofvisual security analysis,which is the topic of the next chapter.

The beginning of this book introduced all the building blocks necessary to generate graphs from security-related data. I have discussed some of the data sources that you will encounter while analyzing security data. The discussion showed what information each of the sources records and what some of the missing information is. I discussed the different graphs and how to most effectively apply them to your problems, and after that I introduced the information visualization process, which guides you through the steps necessary to generate meaningful visual representations of your data. As the last step of the information visualization process, I briefly touched on the analysis and interpreta- tion of the graphs generated.

This chapter elaborates on that concept and shows different ways of analyzing security data using visual approaches. I separate the topic of graph analysis into three main categories:

• Reporting

• Historical analysis • Real-time monitoring

Reporting is about communicating and displaying data. Historical analysis covers various aspects of analyzing data collected in the past. The motivations and use-cases are manifold. They range from communicating information to investigate an incident

5

Visual Security

In document Autoconcepto, una propuesta de intervención (página 34-40)