• No se han encontrado resultados

Estado de la Inversión Pública en Transporte Carreteras en el Perú

CAPíTULO IV. RESULTADOS

4.3 Estado de la Inversión Pública en Transporte Carreteras en el Perú

Henry v Apache represents another hidden danger in the developing areas of electronic discovery - familiarity with the specifics of the underlying technologies used in the search system. The problem found here of the Defense failing to translate the short hand notation into a valid search query shares pitfalls of the earlier example involving saltwater. To the human eye, a consistent notation which is symbolically plausible can easily escape notice even though erroneous. This problem is compounded when we consider the\xnotation

may be valid in other systems; in fact, it is valid in Westlaw, the legal search engine commonly used by attorneys. If we assume this is the reason the error was not caught - familiarity with search systems for which it is valid, and naive assumption this syntax might be universal - it still leaves the troubling fact this was not corrected even after clarification was requested regarding the notation’s meaning and use in the Concordance searches.

Figure 7.1.1: Screen capture of the\xnotation working in Westlaw.

I will now further refine the functional representation of an act of searching I introducedsupra, to extend

the functional representation to include the normal practice of conducting numerous searches in discovery. (Affidavit, p 4) Recall the representation Sj(Qi, Aj(U,{Corpus})) ={Result}. This represents one search

queryQ1 which in the case here may have been “cameron near3 field”. There were twenty searches using

invalid notation cited in my affidavit here. (Affidavit, p 4) We might term these{Q1, Q2, Q3. . . Q19}or more

generally{Q1, Q2, Q3. . . Qn}. It would make sense, then, to represent the searches for these revised queries

as:

Pn

i=1Sj(Qi, Aj(U,{Corpus})) ={Result}

Figure 7.1.2: Functional representation of a series of searches inDiscovery

Considering now the problem with syntax translation, we might assume that the individual specifying the search had a boolean query in mind but lacked knowledge of the search system (or was unaware the queries were expressed in the wrong system’s syntax). The functional representation may then be further refined by introducing a functionTj which will translate any queryQi such that its logical representation is

Pn

i=1Sj(Tj(Qi), Aj(U,{Corpus})) ={Result}

Figure 7.1.3: Revised functional representation of a series of searches inDiscovery.

As thediscoveryprocess is reduced to a functional representation, we see the complexity more clearly.

The preceding case studies deal with conflicts over a single and specific systemSjbecause in the circumstances

of those instances, that was the only system involved in the conflict. In thediscoveryprocess there is more

likely to be a larger heterogenous set of systems needing to be searched. As we enumerated the queries so to should we enumerate the search systems {S1, S2, S3...Sm}. This leads naturally to a final refinement of

the functional representation: Pm

j=1

Pn

i=1Sj(Tj(Qi), Aj(U,{Corpus}j)) ={Result}

Figure 7.1.4: Functional representation of a series of searches in Discovery conducted over a series of

searchable systems.

Now the complexity continues to increase. We can describe the complexity in terms of#{S},#{Q}, and

#{Corpus}, generically resulting inT(n) =O(n3)as a base.5 If#{S} is reduced to1 thenT(n) =O(n2).

The only unknown complexity is,ergo, the individual complexity of a Qi run inSj; where such complexity

exceeds that of the system or worse pushes it to another complexity class, it might be termed either not feasible or in need of revision. This is often the case withQiwhich use wild card prefix or postfix notation6.

The case studiessupraexamine disputes where there is a single, pre-existing documentcorpusand search

system. This is not generally the case, and was not the entire case in Harris v BP. Not included in that case study was a dispute over email databases of which there was potentially both a legacy system and a modern system. In general incidence, a discovery will encompass document database systems, email

systems, documents on individual computer systems and the like. Some of these data systems may haveSj

coupled with the{Corpus}j but others will just be an unbound{Corpus}lackingSto search with. As I will

introduceinfra, there are numerous systems available for review, but none in the Open Source tool chain. I

seek to cure such in Part III.

5Each{Corpus}

jis specific to theSjotherwise running every{Corpus}kthroughSjwould increaseT(n) =O(n4)and be highly duplicative.

6In my experience, lawyers often call this root expansion in the later case. I dislike the root expansion terminology because in some systems the root expansion and a postfix wildcard may be identical, but in others root expansion may rely on some linguistic sense of root expansionexempli gratia Base* expanding to bases and baseball, but not to a non-dictionary term BASE3114 or some such. In cases where root expnsion has such a meaning, there may be a real difference between root expansion and wild card.

Case Study Conclusions

The three case studies presented herein provide a novel view of problems arising in discovery, the causes of those problems, and a view over time showing the problems are not being addressed. The Harris case involves two major oil companies as defendants; these companies have vast resources at their disposal, and sophisticated legal teams regularly handling litigation. At both BP and Conoco there is a clear deficiency in the institutional knowledge possessed by the respective defense legal teams. The problems identified in the Livelink system returning different result set sizes on identical queries run under the same user, and dramatically larger result set sizes when those queries are rerun with super user credentials show that not only is a sufficient search not being conducted, but also that the defense is mistaken in their assumptions to the contrary.

The VPSB case shows similar concerns about user access and search query sufficiency, but with Chevron. In both the opposition brief and at oral argument, the defendant’s counsel argue not about why they think their searches are sufficient, but instead argue the quantity of documents produced should indicate sufficient searches. However, as demonstrated by the events in Harris, the quantity of documents is not relevant if poor search practices or insufficient user access are present. A bad search is, by analogy, like leaving rooms investigated because other rooms have already been found.

Apache brings the case studies full circle back to BP again. Again, concerns about search query sufficiency are raised regarding the discovery searches. In addition to inadequately designed searches, I discovered invalid syntax was being used thus invalidating the majority of search results the defense claimed to have run. In other words, even if the searches were adequate in their scope, the results would still be wrong because the searches were translated into invalid syntax.

The flawed practices I detailed in the three case studies went undetected by numerous sophisticated parties (IT personnel, in-house counsel, outside counsel, and litigation support personnel) in three very large, well funded organizations. The problems were not cured as we see from BP’s repeat failure in Apache after been put on notice by the events in Harris. This leads to the conclusion that the problems are real, are pervasive, and are not being addressed. Because the discovery process leaves the opposing side with limited knowledge of how searches are conducted, significant technical knowledge is needed to detect these

Part III

Chapter 8

Black Friar

8.1

Background of the Field

At present the field has three main phases of practice: acquisition, analysis, reporting. Acquisition originated in dead acquisition where the data storage medium, such as a hard drive, is imaged byte-for-byte to produce an exact duplicate when the system is powered off. The duplicate is hashed for later verification after analysis is complete. In a more modern twist, Live analysis involves acquiring data from a system while it is still running. Live acquisition allows for preserving more ephemeral data such as memory dumps, active network connections, logged on users, running programs, etc which would otherwise be lost in powering the system down for dead acquisition. Live acquisition risks the triggering of anti-forensics tools, malicious commands from still logged in users, and damaging the system state.

However the storage data is acquired, the image files are transferred to a tool suite where the exam- iner/analyst/investigator/researcher etc begins the careful analysis process. Depending on the size of the datasets, analysis may take a very long time to complete. Whether the examiner uses a tool suite, or a collection of individual tools some common tasks will be carried out. Files will be hashed to exclude known good files (operating system libraries, known executables, etc), suspicious files will be flagged (erroneous file extensions, unexpectedly large files, encrypted or password protected files) for closer scrutiny, and text will be searched for relevant keywords. When the examiner finishes the analysis, a report is drafted summarizing and interpreting the evidence ready for consumption by lawyers, law enforcement, and the courts.