Last May, a number of colleagues were brought together at the Bellagio Centre organized by Linnet Taylor and other former colleagues at the Oxford Internet Institute for a workshop on “Big Data for Positive Social Change in the Developing World”. The event included practitioners using big (and open) data from around the world to discuss the potential benefits and dangers of using digital data for data-driven policy-making and campaigns. Although I knew something about Ushahidi, Flowminder, Global Pulse and Grameen’s Applab, there were lots of other initiatives that I didn’t know very much about. These are a few of the exciting ones:
1) The organization Chequeado is fact-checking presidential speeches in real-time in Argentina and Colombia in order to shine a light on politician’s truths, lies and half truths.
2) The organization, Black Monday in Uganda is working to make information about the country’s budget more open and accessible to the public in order to encourage activism around the budget.
3) Tactical Tech and Privacy International are both campaigning on privacy issues. Privacy International has been taking the UK gov to court over citizen data breaches and also periodically gives information about authoritarian government issuing tenders for surveillance technologies. Tactical Tech has launched campaigns aimed at individual privacy awareness like “Me and My Shadow”. This project allows individuals to gain awareness of how much data they are making available about themselves online through platforms such as facebook, twitter, internet browsers.
A White Paper from the event has now been released with more details on all the participants’ initiatives and also on the ideas that came out of the meeting. Here are my highlights:
1) Governments and civil society needs to be become more ‘data-aware’ and better understand the risks of ‘exposure’ and ‘transparency’ for vulnerable groups. This concern becomes especially important as the costs of using online tools like Ushahidi lower over time. At present, there is a disconnect between software developers making the relevant technological tools and the civil society groups developing the strategies to use them. This disconnect becomes more dangerous when online tools are being used in authoritarian contexts, where governments may wish to track down users. Organizations like Tactical Tech are therefore developing advocacy and awareness campaigns around big data and digital privacy to highlight the dangers of growing exposure. During the workshop, we also talked about how organizations such as Ushahidi might do more to educate users about possible harms.
2) Developers need to become more context-aware. Initiatives like Orange’s Big Data for Development Challenge in Cote D’Ivoire (where the company made its dataset available to researchers to “solve” development challenges) encourages international developers and researchers to “solve” problems from afar. As potential harms might be different from context to context, it is important for local researchers and civil society groups to be involved in discussions about how data can be used to interpret social problems and social change. Grameen Applab is building an impressive database of information about Uganda’s rural poor partly because it combines software developer knowledge with its community knowledge workers network. These local actors should ideally also be involved in discussions about when and how data is shared with other groups.
3) Being able to use new (and digital) sources of data depends on older forms of data to ensure representativeness and verification. As Morten Jerven’s book Poor Numbers makes clear, alongside the hype for big data, there is a dearth of funding and interest in traditional forms of statistical collection like censuses. These two forms of data collection can strengthen one another and it is particularly important to establish long-term funding for longitudal data, rather than lots of one-off surveys.
4) Initiatives such as Global Pulse, which use social media to track public opinion in developing countries need more information about the demographic groups using social media platforms. While such approaches may not be able to tell us how the whole of a population might feel about a certain issue, it may be able to tell us something about how the demographic using the platforms feels. This may mean that we should change the kinds of research questions we ask from social media and also think about how organizations like Global Pulse can reach under-represented groups using other tools.
5) We should not become so enamoured with ‘open data’ and digital tools that we lose track of who is able to see information online. Making data ‘open’ might mean “nailing a local authority budget to a town hall door every month”. Indeed, putting information online can actually make information less visible in certain situations (see my chapter in the Internet and Society book). Similarly, the Open Data movement needs to think about offline approaches as well.
6) We need to have a better discussion on the limitations of data anonymisation. In many cases, it is possible to de-anonymise data and identify individuals or groups. Companies that release their datasets to researchers need to be more upfront about the dangers of identification and need to think more carefully about they can ensure that data breaches do not occur. Similarly, in discussions with Nishant Shah from the Centre of Internet and Society on India’s Biometric Platform, I learnt how important it is to think about how privacy concerns should be built into the technical architecture of data collection tools and databases (i.e.- keeping sensitive information in separate silos and ensuring that there are checks within the system when data is combined). In other words, thinking about privacy involves both technical and legal expertise, and involves a whole range of actors (which makes it difficult to resolve!).
7) As Piketty has argued in references to tax and wealth data, when we regulate things like big data, we develop better expertise and awareness about the state of the art. At the moment, there is very little regulation governing how new digital forms of data can or should be used in countries in the developing world where there is little domestic privacy protection. In the workshop, we discussed the possibility of international ethical boards, similar to the medical research field. Establishing these kinds of bodies would also concentrate expertise and information on the state of the art of big data, and possibly allow the field to develop a more coherent research methodology and reflexivity.