Читать книгу Administrative Records for Survey Methodology - Группа авторов - Страница 53
2.4.5 Data Silos
ОглавлениеOne concern with the increasing move to multiple distinct access points for confidential data is the “siloing” of data. The critical symptom is a physical separation of files in distinct secure data enclaves. The underlying causes are the incompatible legal restrictions on different data. Typically, these restrictions impose administrative barriers to combining data sources for which linking is technically possible.
Such administrative barriers may also be driven by ethical or confidentiality concerns. The question of consent by survey or census respondents may explicitly prevent the linkage of their survey responses or of their biological specimen with other data. For example, the Canadian Census long form of 2006 offered respondents the option to either answer survey questions on earnings, or consent to linking in their tax data on earnings. In the 2016 census, the question was no longer asked, and users were simply notified that linkage would happen.
In the case of the LEHD data, as of December 2015, all 50 states as well as the District of Columbia had signed agreements with the Census Bureau to share data and produce public-use statistics. It would thus seem possible for researchers to access a comprehensive LEHD jobs database through the FSRDC network, by linking together the job databases from 51 administrative entities. However, all but 12 of the States had declined to automatically extend the right to use the data to external researchers within the FSRDC network. Nevertheless, some of the same states that declined to provide such permission in the FSRDC give access to researchers through their state data centers or other means. The UI state-level data is thus siloed, and researchers may be faced with nonrepresentative data on the American job market. Several European projects, such as Data without Boundaries (DwB), have investigated cross-national access with elevated expectations but relatively limited success (Schiller and Welpton 2014; Bender and Heining 2011). Increasingly, the U.S. Census Bureau and CASD also host data from other data providers, through collaborative agreements, moving toward a reduction of the siloing of data.
Secure multiparty computing may be one solution to this problem (Sanil et al. 2004; Karr et al. 2005, 2006, 2009). However, implementation of such methods, at least in the domain of the social and medical sciences cooperating with NSOs, is in its infancy (Raab, Dibben, and Burton 2015). The typical limitations are the throughput of the secure interconnection between the sources and the requirement of manual model output checking. These limitations drastically slow down any iterative procedure.