Open Data versus the Mosquito
The current global panic about zika can be boiled down a “data gap” issue. Gaps in understanding of why it has started spreading so rapidly now, a gulf in fathoming its effects on pregnant women (evidence linking zika and microcephaly is still only spatio-temporal rather than causational), and gaps in sharing the research data and clinical specimens that will enable the global research community to keep one step ahead of the virus spread. As with Ebola, there has been much frustration of many key players not sharing these materials. Despite the fact that in a life-and-death situation wild speculation and panic fills the vacuum, and closed data risks lives.
All this makes the zika crisis a perfect opportunity to harness the benefits and showcase the utility of open approaches. In particularly open and collaborative efforts using Open Data and Open Source hardware. An international group of makers / hackers / scientists / citizen scientists trying to develop innovative measures against zika, and Open Data Hong Kong have teamed up with MakerBay to join these efforts. Join us at the zika hackathon on the 16th February at MakerBay in Yau Tong (see their event page here). We’ll be linking up with the global google hangout with other zika hackathon participants in Brazil, Australia, Singapore, and beyond. Then discussing and pitching projects where we can contribute from here in Hong Kong. From both of our data hacking and hardware hacking perspectives, and where these different stands of “open” can be combined to produce crowdsourced data collection tools and apps to see if citizens can do better than the supposed experts in filling in these data gaps.
The “Asian tiger mosquito” Aedes Albopictus, which is among 60 types of mosquito that can carry the virus if it bites an infected person, is endemic to Hong Kong. The warmer year-round weather and more extreme rainfall patterns we are currently seeing will make the city even more favourable for mosquitoes from the Aedes genus, sparking warnings from local health officials to eliminate breeding areas. On top of the threats of zika, we already have sporadic dengue outbreaks from these vectors, and the Hong Kong government currently has an Oviposition Trap (Ovitrap) screening program to detect the presence of adult mosquitoes. With only 52 locations across Hong Kong selected for the vector surveillance, and the mosquitoes having a roughly 200m range, more than 98% of Hong Kong is currently not covered and there is a need for much more data collection and presentation (the FEHD presenting not very helpful PDFs). Contrasting this with the more dynamic data driven approaches of dengue reporting Singapore uses, Kaggle competitions for West Nile Virus modelling, and Spanish efforts at crowdsourcing tiger mosquito spotting (with no Hong Kong data collected to date) show a few approaches we could follow here.
Are you interested in getting involved and use your creativity to develop innovative technologies and contribute to understand and prevent zika from spreading? Let’s meet up! The event will be co-hosted by Scott from ODHK and Ajoy, Jacky and Nicolas from MakerBay, and efforts will be longitudinal following the ongoing international hackathon efforts. For more see:
- Zika International Hackathon Facebook group : https://www.facebook.com/groups/zikahack/
- First Hackathon meetup / Google hangout : https://www.youtube.com/watch?v=VXb_44R_tNA (embedded below)
- Notes from the meeting : https://docs.google.com/document/d/1f0g1kWn8HMlU0hmpl1QMy0uUAs02QYP3b5ru1T1xZhc/edit?pref=2&pli=1#heading=h.p73jtyc2cxn
Tuesday, February 16th 2016, 6:00pm
Location: MakerBay, 16 Sze Shan Street, C1 Yau Tong Industrial Building Block 2, Yau Tong, Kowloon
See this on Google maps.
See this event on Facebook.
UPDATE 23/2/16: MakerBay have a write-up of this event now posted, and you can see the archived livestream below. Thanks to everyone who attended, and keep following to see how the pitched projects develop.
Last week the Open Science Working Group of ODHK had an Oped in South China Morning Post (SCMP) discussing issue of fighting academic fraud through use of Open Data. This is a particularly topical issue at the moment with recent scandals implicating many academics in Mainland China with large-scale peer-review fraud covered in the Washington Post. With kind permission of SCMP we are posting an updated and extended version of the piece here, and being good Open Data purists include links to much of the source material discussed.
The scandal of scientific impact
The idealized view of science as the curiosity driven pursuit of knowledge to understand and improve the world around us, has been tarnished by recent news of systematic fraud and mass retraction of research papers from the Chinese academic system, and allegations of attempts to game the peer-review system on an industrial scale. With much of our R&D funded through government, we all hope our tax dollars are spent as wisely as possible, and around the world research funders have developed methods of assessing the quality of their funded researchers work. One of the most widely used metrics to assess researchers is the Journal Impact Factor (JIF), a (proprietary, closed access) service run by Thomson-Reuters, that ranks the academic journals that scientists publish to get credit in. While many countries have tried to broaden their assessment system to take account a more balanced view of a researchers impact, in China the numbers of publications in JIF ranked journals is currently the only activity that researchers are judged by, and huge amounts of money are changing hands (often hundreds of thousands of RMB payment for a single publication in the top ranked journals) through this system.
This biased focus on one metric above all others has directly lead to large scale gaming of the system and a black market of plagiarism, invented research and fake journals. Following from previous exposé’s of an “academic bazaar” system where authorship on highly ranked papers can be bought, Scientific American in December uncovered a wider and more systematic network of Chinese “paper mills” producing ghostwritten papers and grant applications to order, linked to hacking the peer review system that is supposed to protect the quality and integrity of research. The first major fall-out from this has occurred last month, with the publisher BioMed Central (BMC) retracting 43 papers for peer review fraud, the biggest mass-retraction carried out for this reason to date, and increasing the number of papers retracted for this reason by over a quarter. Many other major publishers have been implicated, with the publisher of the worlds largest journal PLOS also issuing a statement that they are investigating linked submissions. It takes a great amount of time and effort employing Chinese speaking editorial teams to investigate and contact all of the researchers and institutions implicated, and BMC should be applauded for doing this and fixing the scientific record so quickly [COI declaration: Scott Edmunds is an ex-employee of BMC, and he and Rob Davidson are collaborating with them through GigaScience Journal].
To get an idea of the types of research uncovered and implicated, it is possible to see the papers retracted last month, and Retraction Watch has covered the story in detail. The Committee on Publication Ethics has also issued a statement. Guillaume Filion in his blog has done some sterling detective work providing insight on the types of papers written by these “paper mills” and “guaranteed publication in JIF journal” offering companies still advertising their services. The likely production-line explosion of medical meta-analysis publications coming from China has been well known for a number of years, but looking at the list of publications retracted by BMC in March shows a worrying introduction of many other research types such as network analysis.
Like in J. B.Priestley’s famous morality tale, An Inspector Calls, any evil comes from the actions or inactions of everyone. On top of the need for better policing by publishers, funders and research institutions, there needs to be fundamental changes to how we carry out research. Without a robust response and fundamental changes to their academic incentive systems there could be long term consequences for Chinese science, with danger that this loss of trust will lead to fewer opportunities to collaborate with institutions abroad, and potentially building such skepticism that people will stop using research from China.
While we are rightly proud of Hong Kong’s highly regarded and ranked universities system (with three Universities ranked in the world top 50), we are not immune to the same pressures. While funders in Europe have moved away from using citation based metrics such as JIF in their research assessments, the Hong Kong University Grants Committee states in their Research Assessment Exercise guidelines that they may informally use it. In practice some of the Universities do follow the practice of paying bonuses related to the impact factor of journals their researcher publish in, leading to the same temptations and skewed incentive systems that have led to these corrupt practices in China. From looking at the list of retracted papers fortunately on this occasion no Hong Kong based researchers were implicated. With our local institutions increasing their ties across the Pearl River through new joint research institutes and hospitals, and these scandals likely to run and run, how much longer our universities can remain unblemished will be a challenge.
Can We Fix it? Yes We Can!
If the impact factor system is so problematic, what are the alternatives? Different fields have different types of outputs, but there are factors that should obviously be taken into account like quality of teaching, and the numbers of students passing on to do bigger and better things. Impact can be about changing policy, producing open software or data that other research can build upon, or stimulating public interest and engagement through coverage in the media. Many of these measures can also be subject to gaming, but having a broader range of “alternative metrics” should be harder to manipulate. China is overtaking the US to become the biggest producer of published research, but ranks only ninth in citations, so there obviously needs to be a better focus on quality rather than quantity.
The present lack of research data sharing has led to what is being called a ‘reproducibility crisis’, partly fuelled by fraudulent activity but very often just from simple error. This has led to some people estimating that as much as 85% of research resources (funding, manhours etc) are wasted. Science is often lauded as being a worthy investment for any government because the return to the economy is more than that put in. What benefits could be gained if there was an 85% improvement on that return? How many more startups and innovative technologies could be produced if research was actually reusable?
There is growing movement from funders across the world to encourage and enforce data management and access, and we at Open Data Hong Kong are cataloguing the policies and experiences of Hong Kong’s research institutions. Sadly, at this stage we seem to be far behind other countries, currently ranking 58th in the global Open Data Index (just falling from 54th earlier in the year). One of the main benefits of open data is transparency, which would have made the current peer review scandal much harder to carry out. It is encouraging that the Hong Kong Government is already promoting release of public sector data through the newly launched Data.Gov.HK portal, but it is clear that our research data needs to be treated the same way. ODHK is the first organization in Hong Kong (and 555th overall) to sign the San Francisco Declaration on Research Assessment that is trying to eliminate the use of journal-based metrics. To help change the skewed incentive systems we would encourage others to join us by signing at: http://am.ascb.org/dora/
Scott Edmunds, Rob Davidson and Waltraut Ritter; Open Data Hong Kong.
Naubahar Sharif; HKUST.
See SCMP for the published version of the Oped here: http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-incentives-curb-research
On Wednesday the Hong Kong Government’s Office of the Government Chief Information Officer unveiled their revamp of their Public Sector Information (“PSI”) portal, taking pride in still making available its 3000 datasets for the public to use for both commerical and non-commercial purposes. Hong Kong’s open data enthusiasts familiar with the old site “Data.One” will appreciate the new “Data.Gov.HK” site’s easier navigation, improved functionality and categories. With this revamp, the government is demonstrating its continued support of availability of PSI and echoing the Financial Secretary’s 2014-2015 commitment to push all government bureaux and departments to:
“[make] all government information released for public consumption machine-readable in digital formats from next year onwards to provide more opportunities for the business sector.”
The revamp is a step towards this goal, identifying what departments submit what data, and its ease of navigation encourages access and use of existing government data sets, which is good. Through there are some snags and errors in the new site, this is expected of any website update. Surely government is reviewing the problems but as the site doesn’t support a dialogue or connection with the public, that is hard to tell.
While governments around the world are realizing greater policy review through scrutiny, supporting greater civic engagement, and realizing better efficiency by supporting Open Data, the government’s revamp demonstrates that the Hong Kong government is still just catching up with past trends to publish government information data. While there is no debate that government making data available “opens up new business opportunities” as well as “bring convenience to the public and benefit society as a whole”, the site runs short in its approach that it can merely publish a number of datasets without a view of data quality or update a site and that apps and benefits will just materialize. To realize similar benefits as seen by other governments, the HK government should work with open data enthusiasts and also adopt the same open data principles and standards of the international community.
Open Data Hong Kong recommends the government adopt Open Data standards and principles, and reflect that with the Data.Gov.HK site. Namely:
Adopt an open license on the datasets
The Terms and Conditions are problematic for supporting app-building, collaboration, and analysis of government data. The following terms are examples:
- “you shall reproduce and distribute the Data accurately, fairly and sufficiently;”
Data used for apps and analysis will get reformatted. And inaccurate data will have to be corrected and modified by developers and analysts. Would this be considered breaking this condition? It is unclear, ambiguous, and thus stifling. Although the terms and conditions include a waiver of responsibility by the government, this condition is confusing at the least and overbearing at the most. OGCIO should adopt an Open License such as CCO or copy the UK Open Government License on the datasets.
Improve communication & engagement
Users of the site can’t find out which datasets are new, if there are any. As much as the site would like to see use among the 3000+ datasets (with more to be added regularly), supporting better communication and engagement among users would be useful for information to flow both ways.
It’s not enough just to have datasets available, the data has to be effective, relevant, usable. How can users provide feedback about the datasets and communicate to government about the effectiveness (or uselessness) of a dataset? Or that a dataset is missing? This is not in the functionality of the site. An example of doing this comes from the Government of Canada where they ask for users to “Suggest a data set”, as well as functionality for users to provide a rating on datasets, and the hosting of open data competitions (which are also open to people outside of Canada).
It’s about the data. Data Quality.
Users of the datasets continue to have problems with the quality of the data. This hasn’t changed with the site revamp. Users have reported unclear and inexistent schemas for datasets, inferior data types for the data, and inexistent API support for pervasive connection with data. Although data sets can be very different from one another, the right file type and data connection goes a long way to supporting app development and value for the public, and avoiding severe headaches from app developers and head-scratching from analysts. We encourage OGCIO to provide departments or provide the right expertise and support for bureaux and departments making datasets available.
Commitment to the future
The data.gov.hk update demonstrates government’s support for making data available to support the public. Open Data Hong Kong continues to reach out to OGCIO and the rest of government to improve this, while recommending full support of open data to best realize public benefit.
Eternal Sunshine of the Spotless Internet
The Michel Gondry film “Eternal Sunshine of the Spotless Mind”, a Sci-Fi Romantic Comedy tackling the unpredictable consequences of erasing all memories of a love affair, is quite a good primer on the potential side-effects of the “Right to be Forgotten” (RTBF) policies, where online search engines can be compelled to “forget” information that is deemed outdated. As the internet is increasing becoming the host of our collective memories, moves to delete specific parts of our digital histories are likely to have equally painful and unintended outcomes. Our colleagues at Open Knowledge are clear that these moves pose a threat to transparency and open data, setting up a Personal Data and Privacy Working Group to discuss the matter. Currently a hot issue in Hong Kong, with the Hong Kong Privacy Commissioner Allan Chiang Yam Wang recently expressing strong support for the controversial policy (including on his blog), yesterday ODHK decided to delve further into the issue by hearing some local perspectives on the matter. After a bit of a break, the ODHK meets were back with a jam packed bill including presentations on RTBF by Frankie Chu of InMedia and economic governance activist David Webb.
Hong Kong InMedia is a citizen journalism advocacy organisation, and Frankie Chu started proceedings pointing out that for countries with poor access to information legislation such as those in Asia, any moves to alter search engine results will disproportionally effect citizens ability to access to information freely. Lacking archiving ordinance also makes Hong Kong particularly vulnerable, something which the Hong Kong Archives Action group has highlighted. In-Media has covered the topic regularly, and have put together a fantastic animation that gives a great overview of the potential risks of RTBF legislation, particularly from a Hong Kong perspective. Writing an open letter, In-Media and the Hong Kong Civil Liberties Union are also collecting signatures to let the Hong Kong Privacy Commissioner know public opinion is against these moves.
David Webb gave a presentation based on a recent talk he gave at the Christmas AGM of the Hong Kong Library Association covering a lot of similar ground, but going into more detail the test cases that have brought us to where we are today. Highlighting that Hong Kong media organisations such as the HK Standard are already reducing their archives, the most chilling consequences would be the growing political leverage that search engines like google, or J. Edgar Hoover types in security services and governments would have by becoming gatekeepers of information. David’s slides are available from his Webb-site here.
The Meet ended with a Q&A from Frankie and David, and updates and news from some of the ODHK working groups such as Guy Freeman and the OpenGov.HK access to information portal, and Scott Edmunds and the Open Science Working Group. Bastien plugged the ODHK and Code4HK “Hack the Hong Kong Budget” make on the 28th February, and the next ODHK meet will be on the 9th March, so watch this space for further news and updates. We look forward to seeing many of you there.
This month government, academics, and citizens (and Open Data Hong Kong!) joined the HK Government Efficiency Unit to share insights on the potential of better reporting and sharing of Air Quality monitoring data. Entitled “Open Data and Citizen Science Reporting Potential for Air Quality Monitoring”, the seminar was an exploratory event to see what could happens if we share what people are doing with air quality data, the challenges we face, and the potential ahead for pilot projects and more. Weather and pollution data are no brainer areas to open up to an Open Data approach, as they are topics of interest for concerned and engaged citizens, and the Hong Kong Observatory already makes much of this data available to the public through the AQHI (Air Quality Health Index) website. Connecting the producers of this data with downstream users, from government, academic and non-academic backgrounds, should help maximize the value of this precious data. Open Data Hong Kong are experienced and well placed to advocate as a collective voice for more data sets and better quality data on the environment and air quality, and we’d like to thank Kim Salkeld, the head of the efficiency unit, for inviting us.
Mart has already posted his quite detailed notes on the talks, but to summarize, the first part of the afternoon featured representatives from many of the relevant government departments like the Environmental Protection, Agriculture, Fisheries and Conservation, the HK Observatory, and the Efficiency Unit “1823” enquiries and complaints hotline. Ivy So from the BioDiversity Division of the Agriculture, Fisheries and Conservation Department presented on some of the interesting apps they’ve developed like “tree walks”, but most interesting from a citizen science perspective has been moves to allow the public to post pictures and register animal sitting on their HK biodiversity database and Eco map portal. They’ve also promised to release much of this data as XLS files, so watch this space to see the result. It would be great to have hackathons, visualisations and apps built using this data, and there is a shortage of useful biodiversity data in the global biodiversity GBIF databases, so anything to boost this is much needed. John Chan from the HK Observatory also covered some of the Citizen Science side, presenting on their engagement with schools and interested individuals through their Community Weather Observing Scheme.
The second part of the afternoon presented to point of view of the public. This is where Open Data Hong Kong stepped in, Bastien from ODHK setting the scene with a short overview on the benefits of openness and transparency, and showing a few case studies on how open pollution data has been successful in countries such as the UK (see his slides here). Representing Code4HK Vincent Lau and Harry Ng gave civic hacker perspective, showing examples of what they and others are doing with this data (e.g. their real time visualizations of the AQHI data thrown together in one hour), and highlighting the shortcomings and difficulties working with the data in its current form. Andrew Leyden was the last speaker in the ODHK section, and also highlighted the potential and problems experienced by hands on users of HK Observatory data when building the “Hong Kong Air Pollution” App.
What was clear from this section reiterates the main issue with publicly accessible datasets in Hong Kong. They have great potential, but are presented wrongly, and under unhelpful restrictive licenses, so much time and effort is spent unnecessarily scraping, cleaning and processing this data, these datasets are legally and not practically interoperable with others, all of which reasons put off many potential users. Bastien showed Hong Kong’s ranking in the global open data index, where we are placed 13th in the world for open emissions data, and the main thing preventing our (70%) score from topping the table was the lack of true open licensing. This would be a very easy issue to fix, costing nothing, and massively increasing the potential utility and reuse of our data.
The final section of the workshop brought on some of the formal experts from the Environmental Protection Department and local academics working on this area. Zhi Ning from CityU and Alexis Lau from HKUST both presented data from mobile pollution detectors, CityU doing more grass roots and medium scale sensing work using open hardware arduinos and working with local schools. HKUST have been working with larger scale detection units built into trams, and they in particular have huge amounts of air quality and modelling data collected around the pearl river delta and beyond that would be fascinating to open up and let others work with. Currently hosting 400TB of data that is available for academic if not other purposes, it would be great to liberate and see what could be done with this data if it was made more widely available from public repositories.
There are plans to meet again, so watch this space. With the Efficiency Unit’s support, they can rally government departments around the table, and it would be great to set up a working group at ODHK to continue this work. Any interested individuals should contact us if they were interested in participating and helping us to advocate for open data to help understand and improve the local air quality.