FINANCIEROS CONSOLIDADOS
B) BASES DE CONVERSIÓN
Within the scope of this thesis, we developed the Mediabase model for the OSS projects as seen here in Figure 5.4. The detected non-human (Subsection 5.2.1) and human (Subsection 5.2.2) OSS actors are mapped to the elements of the Mediabase data model. Management tools used by OSS projects are reflected in the medium instances. Every medium stores its artifacts, which are generated and consumed by the processes performed by an agent. For example, a project participant sends an email to the project mailing list. Here, the mailing list is a medium, the email is an artifact, and the project participant is an agent. An agent can perform different social roles. In the OSS development context the roles are project manager, active developer, peripheral developer, active user, etc. The OSS media, artifacts, processes, and agents can be formally described as follows:
M edium = {W iki∗, M ailingList+, F orum∗, BugT racker∗, RCS+};
Artif act = {Link, M ailingP ost, F orumP ost, BugReport, Commit, SourceCodeF ile}; P rocess = {Acquisition, Search, M onitoring, Retrieval, T ranscription, Addressing, Change, M erge, Report}; Agent = {P rojectLeader, CoreM ember, ActiveDeveloper, P eripheralDeveloper, BugReporter, P assiveU ser, Stakeholder, . . . }.
The data collected from the OSS repositories can be considered as a set of facts. For example, a user X submitted the file Y to the RCS of the project Z on the date W . Here, X, Y, Z, and W are facts. The relationships between facts, on the other hand, can be defined in arbitrary ways:
• participation in the same forum thread, • direct reply in the mailing list,
• commit on the same component, • commit on the same file,
• discussion of the same bug. • . . .
Figure 5.4: Mediabase Mo del for OSS Pro jects.
(a) Developers Connected by Shared
Communication
(b) Developers Connected by Col-
laborative Development
Figure 5.5: Social Networks of the BioPerl Developers.
An SN communication relationship can be even defined by means of collaboration in several media, e.g. if two users have both: participated in the same mailing list thread and modified the same source file. Further, the relationships can be defined either as directed or as undirected. Here we can consider the complete project history or analyze its subset, for example, only spam-free discussions. Alternatively, the data subset can be defined using the time dimension: data from the last year, data from the last month, or even data from the last day. Furthermore, the restrictions can be introduced based on the agent types, e.g. only postings from either newcomers, or bug fixers, or core members, etc. The selected relationship defines the social structure of the OSS network – another actor of the Mediabase model for OSS projects. We formally define the following OSS network types:
LikageT ype = {SameT hread, RepliesT o, CommitOnSameComponent, CommitOnSameF ile, CommittedBySameU ser, . . . } M ediumT ype = {AllP rojectM edia, RCS, RCSAndBugtracker, . . . } Artif actT ype = {AllP osts, Spamf ree, T hreadStarter, F romDevelopersOnly, ˙} T imeT ype = {W holeP rojectP eriod, Y ears, M onth, W eeks, Days} AgentT ype = {AllP rojectP ersons, N ewcomers, Bugf ixer, . . . } . . . AgentM ediumT ype = {AllM ediumAllArtif acts, . . . }
Each network type is a specialization of the OSS network entity and each one belongs to an OSS network of a certain type. Depended on the selected network type, the agents can have different ranks. For example, we can generate two networks based on the data from one OSS project. The first SN has the project participants as nodes. The nodes are linked, if the corresponding persons have collaboratively participated in the project’s RCS. The second SN is one with the same nodes. However, the nodes are linked by shared communication. Figure
Table 5.2: Commit-Message-Mapping.
COMMIT MESSAGE Description
ID ID contribution identifier Subject Subject addressed topic Content Comment textual description
Date Author Date creation date Project.ID Project.ID ID of the project Person.ID Person.ID creator ID
5.5a displays the SN of BioPerl developers who are connected for the purpose of exchanging emails in the project mailing lists. Figure 5.5b depicts the SN of those connected developers who have edited at least one file in common. The first SN mirrors the social structure of the communication flow, whereas the second one reflects system development. The social topographies of both networks are quite similar. The core of the communities can be visually detected in both SNs. The effort distributions within the networks follow the power law just as expected. However, social status of a project participant can differ depending on the selected SN. One very active developer centrally placed in the developer network according to his/her efforts might lack communication ambition and hence be socially unimportant in the communication network.
The SNs can even be built over several OSS projects allowing for a cross-project analysis. Figure 5.6 displays the SN of two OSS projects: Biopython and BioPerl. The clustering of this SN can be used to identify strong internal sub-structures: the red cluster is the Biopython network and the light pink cluster is the BioPerl network. In addition, we observe a strong interconnection between two projects. Project participants who link both networks are called knowledge brokers. Knowledge brokers are important force for knowledge transfer from one project to another. There are plenty of study opportunities available based on cross-project analysis. For example, we can investigate the migration of participants from one project to its competitors. The realized hierarchical decomposition assures the same level of aggregation for all management resources and supports cross-project analysis.
At the more detailed level of aggregation, the artifacts are modeled as follows. Figure 5.7a depicts the ER model for storage of the data collected from the RCS. In turn, the ER model for the data collected from the mailing lists is shown in Figure 5.7b. Both models are based on the entities: person (=agent) and project. The process of a commit submission to the project RCS can be linked to the process of a message sending to the project mailing list. The attributes of commit and message entities overlap as seen in Table 5.2. Other medium types are modeled in a similar way. This data management approach allows us to study community behavior across different media.
Figure 5.8 represents the ER model for the OSS networks. Each network is defined by its network connections. The network connections reflect which project
Figure 5.6: Community Detection within Cross-Project Network (BioPerl and
Biopython).
participants are interconnected.