Developer retention in software ecosystems

This page provides the material and scripts to replicate our submitted article about developer retention in software ecosystems.

You can download the data and scripts here. The zip file contains 3 files:
– script.R: The R script that runs the survival analyses and exports all the survival curves (in pdf format)
– 2 csv files: One for the developers of each ecosystem

The columns of each csv file correspond to:
– user: The user id based on the GHTorrent dataset
– ta_abandoner (technical activity): Boolean variable concerning the technical abandonment (0: active, 1: abandoner)
– sa_abandoner (social activity): Variable concerning the social abandonment. (0: active, 1: abandoner, -1: the developer was never socially active)
– ta_duration_months (technical activity): Number of months between the first and last commit
– sta_duration (socio-technical activity): Number of months between the first and last commit or social message
– sa_messages (social activity): Number of messages that the developer has exchanged with other developers (social activity)
– sa_activity_months (social activity): Number of distinct months that a developer has exchanged messages with other developers (Different value from sta_duration_months since months with no messages are not considered for sa_activity_months)
– sa_largest_delta (social activity): Largest social inactivity gap measured in months
– ta_commit_contributions (technical activity): Number of commits
– ta_activity_months (technical activity): Number of distinct months that a developer has had commits (Different value from sta_duration_months since months with no commits are not considered for ta_activity_months)
– ta_largest_delta (technical activity): Largest technical inactivity gap measured in months
– developerTechnicalIntensity (technical activity): TI value (refer to the paper for the definition)
– developerTechnicalSpread (technical activity): TS value (refer to the paper for the definition)