• No se han encontrado resultados

La planeación como acción instituyente de un sujeto organizado en pro de sus

9. ANÁLISIS E INTERPRETACIÓN DE DATOS

9.3. Las teorías implícitas y su estatuto epistemológico fundante: el campo cultural

9.3.2. La planeación como acción instituyente de un sujeto organizado en pro de sus

  Once data reusers acquired the data from either the original investigators or data

repositories, they began investigating the data and tried to understand them. Investigating and understanding data are crucial in data reuse in order to transfer the contextual information about the data and the original investigators’ knowledge to the data reuser. The investigation and understanding of data differed slightly among participants depending on their workflows or practices, but in general, participants discussed several common procedures they undertook. They tried to understand contextual information about the data by either going through all relevant documentation or by contacting the original investigators (where there was no or insufficient documentation) and also by reading articles published using the data. During this process, participants simultaneously investigated and checked different aspects of the data, such as validity and reliability. While some participants focused on examining and understanding the data, others extracted their variables of interests or created working datasets by merging different variables or by merging data from different sources during this process. During the reuse phase, they sometimes ran into unexpected problems that sent them back to the documentation.

Reading documentation

Reading documentation was one channel of understanding data, and several participants noted the importance of documentation. Documentation is “the key” in data reuse, and PS14

said, “[It] is crucial. If there’s not adequate documentation then we don’t know what’s going on in the data.” Participants read the documentation and looked for contextual information that enhanced their understanding of the data. The level of thoroughness may depend on the

participants’ experience, subject expertise, or tacit knowledge, which cannot be captured in this study, but all participants wanted to know how a study was designed (including sampling design); how the data were organized and structured; how the data were collected (often with more detail, such as how interviewers were trained); what changes had been made to the data and cleaning processes used on the data; what the variables meant and how the variables had been recoded; what and how measures or scales were used (if applicable); and how the original investigators had analyzed the data. The quality of documentation was significant for further use of the data, and poor documentation could deter participants from using the data:

PP09: [If] it doesn't contain enough of the information you're looking for, you don't use it.

PS01: If documentation is poor and it doesn’t seem to have the information I’m looking for, that’s probably not worth to move on.

The comprehensiveness of the documentation varied from the participants’ perspectives. In general, participants evaluated documentation of institutional data as “extensive” (PP01), “outstanding” (PP09), “pretty thorough” (PP10), “very comprehensive” (PP13), “pretty

complete” (PS03), “phenomenal” (PS15), and “easy to work with” (PP15). They were satisfied with the documentation, except for some data with “sloppy documentation” that “doesn’t make sense” (PS09). Documentation from individual researchers can be “easy to follow,” “very

thorough” (PS08), “very detailed” (PP10), “easy to follow and straightforward” (PS10), and “excellent” (PS08, PS19), but data from individual researchers also had documentation that was “poorly described,” “not very well put together” (PS13), and “difficult to understand” (PP03) with a case of “basically no documentation or codebook” (PS15).

While it is hard to say all participants considered one better than the other, several participants had strong views regarding two types of documentation: documentation associated with publically available data and documentation from individual researchers upon request. Some participants (e.g., PS15) preferred documentation from publically available data because it was ready for reuse and was more complete.

PS15: So I think the [data from a government institution], the datasets that are really well-known and used often, you're not gonna probably have that problem where there's a lack of information because that’s used a lot by many people. I don’t think it's the case in the smaller datasets that people (…) haven't used much, [when individual researchers] would like something done with it and they offer it to others.

However, a few participants (e.g., PS07) expressed a strong preference for documentation directly from individual researchers, claiming that sometimes the documentation of publicly available data barely met the requirements of funding agencies and was not adequate for the purposes of reuse.

PS07: [Documentations of publicly available data], it's user-friendly, right? But sometimes they're uploading just... It's usually they're probably assigning it one of

their research assistants and saying, “Hey. We need to get this thing done fast.” They just upload whatever documents they have there, so they may not

necessarily be cohesive. But if you're using the data [and documentation] that the investigator, he or she, has used to publish, it's usually way richer, [of a] higher quality, understandable, etc.

Both views were valid since they were from the participants’ own experiences. However, the issue was not the type of documentation the participants worked with but the quality and preparedness of the documentation. As long as the documentation was found to be detailed, easily understandable, and well-organized, it was useful whether it was from an institution, from publicly available data, or from individual researchers.

Getting information from original investigators (individual researchers)

Only a few participants said, “there was no documentation” (PS15), and a few had worked with a minimal amount of documentation—at least a description of the original study or “some sort of annotation about data” (PS17). The cases with little or no documentation were generally the data from individual researchers or individual research teams rather than from institutions or data repositories. While participants did not always know why there was no documentation, a few assumed that the individual researchers had not expected other researchers to be interested in using their data.

For cases with little or no documentation, some participants already had a previous relationship with individual researchers, so they were able to understand the data through direct interaction with the original investigators. Further, a close relationship with individual

researchers enabled participants to use data that were less well-documented, as they had access to the individual researchers. However, relying on the individual researchers for information about the data and understanding was not always easy. PP03 tried to understand data from scratch:

PP03: When I actually start getting into the data sets, [I found that] it takes just an extraordinary amount of emails or conference calls with the scientists involved [in data creation] to just understand how the data was collected and potential

problems that may have popped up when the data was collected. Protocol deviations and things like that.

In contrast, PS18’s experience was “very smooth” and it involved “just a few meetings with [the original investigator], I think it’s actually one or two, then I can always call her for more

information.” The differences in these experiences may be due to the participants’ experiences, tacit knowledge, work styles, or the closeness of the relationship with the individual researchers.

Reading publications

Participants also went through publications, articles published by the original

investigators, or articles published using the data. Publications were already discussed as an important factor in participants’ initial trust development. Participants used the publications to aid their understanding of the data and their use for their own research: “there was a lot of information beyond the actual codebook” (PS07). Typically, these publications had a paragraph with brief information about the data and its strengths and significance. This gave “a nice summary of the data” (PS18) to the participants, saving them hours of research time to

understand the background of the studies.

In addition, original papers instructed the participants regarding how the original

investigators had used the data, which several participants found useful. Participants also looked for other pieces of information “that would complement my findings” (PP06). They took a special interest in reading “the limitations of the data from the people who already experienced using it” and wanted to make sure that they “didn’t really see any huge limitations for the particular purpose for this data set” (PP06). As discussed earlier, because participants respected publications that had undergone peer review and respected the authority of those publications, they believed that the quality of the data discussed in the publication “should be [at an]

acceptable level” (PP06)—not just according to the participants but also as evaluated by other researchers in the field.

Checking data

While understanding the data by reading documentation and publications, and interacting with the original investigators, participants continuously checked different aspects of the data. They engaged in a close examination of the data to see where they would spend “the bulk of [their] time” (PP05). Sometimes participants tested the validity, reliability, and missing data. Some participants examined the data for consistency, transparency, and quality of

documentation. What participants wanted to know was directly relevant to their trust judgments, which will be described further in section 3.3.2.