Читать книгу GIS Research Methods - Steven J. Steinberg - Страница 57
Difficulties with the G
ОглавлениеThe geographic context may be difficult to collect because determining the exact location of a piece of data on the ground is not always easy to accomplish or, for reasons of privacy, may not be permissible. When mapping people, we face an additional challenge: people may move around, may be without a home, or otherwise may be difficult to tie to a particular location. However, because geographic data are the heart of GIS, knowing a location of some kind is an essential part of the GIS process (even if it must be spatially degraded or detached from the exact, true location).
For example, if you were doing a study of homeless individuals, it might be better to define their location at the level of a particular neighborhood they call home than at a particular street address. Furthermore, even in studies where mappable locations are available, privacy issues may necessitate degrading that information. In other words, even if you have specific addresses of your respondents, you might choose to degrade the data to census blocks, to neighborhoods, or even to the city level to maintain the privacy required for ethical research. Choosing the level of spatial detail is an important part of the GIS process.
Conceptually mapped features, such as data about perceptions, ideas, or interactions (figure 2.1) are perhaps more difficult to map; although, they are equally important as physical locations in research. For example, social networks or interactions between individuals may be mapped in such a way that people who are emotionally close would be located conceptually close together, whereas individuals who are casual acquaintances might be mapped at a greater distance. Lines connecting people on the map could represent social distance rather than true geographic distance. On such maps, referred to as cartograms, the distance between mapped data is scaled to a variable or index value other than distance. In the case of social ties, this might be an index representing the strength of a particular relationship.
Figure 2.1 University of the Arctic thematic networks map. This map presents collaboration networks between higher education and research institutions in the northern hemisphere. The map’s goal is to illustrate the networks of activity and the geography that these networks cross. Courtesy of Hugo Ahlenius and Veli-Pekka Laitinen. From Esri Map Book, vol. 28 (Redlands, CA: Esri Press, 2013), 29. Data from University of the Arctic.
A second difficulty in GIS mapping relates to the variability that occurs in time and space. Most data are collected as a snapshot in time. We have a more difficult task obtaining data over a span of time to reliably map changes or trends in the data. Furthermore, because many of the things that we may map—especially individual people—will move over time, there is an added dimension of analysis to consider. Do we locate a survey respondent based on her home address or her place of employment, or perhaps based on where the individual is most likely to be at a particular time of the day or week? This decision would be most significantly influenced by the question under study; there are no set answers.
Using computer animation, you can change map data from static to dynamic. However, this type of mapping is still limited by the difficulty and expense of collecting data at a high frequency (temporal scale) as well as by software limitations for incorporating data instantly as it is collected. Fortunately, only a few social science applications necessitate true real-time analysis. Your primary goal as a researcher considering GIS as an analysis tool is to make such decisions before collecting the data.
You also need to consider the spatial representation of your data. Often, in research, privacy is of the utmost concern. Researchers typically lump data to mask individual data points representing individual respondents. Lumping, or degrading, data in this fashion results in a serious trade-off: the true, raw data may be permanently lost and no longer available for future research. As a result, researchers may collect an enormous amount of redundant data when the simple recategorization of existing data in different but equally valuable combinations would have allowed them to explore different questions.
For example, say you are looking at the populations of 426 incorporated cities in California. The cities range in size significantly, from the city of Vernon (population 80) to Los Angeles (population 3.4 million). In examining these cities for research purposes, you should consider numerous methods of categorization. As an illustration, consider an example using five categories for city size, as in figure 2.2.
Figure 2.2 An example of categorical classifications for the size of cities. Size classes can be defined a variety of ways, depending on the objectives and preferences of the map author.
How you choose to organize your data into these categories will have a direct effect on the outcome of analysis. Optimally, you will have access to the actual numbers so that you have a choice in the matter. If not, the metadata should define how cities were assigned to each of these categories.
Typically, your GIS software will have default settings for categorizing and representing these data, as in figure 2.3, which shows portions of Los Angeles and Orange Counties in southern California. The data categorization is based on the defaults used in ArcGIS (a popular GIS software package produced by Esri)—five categories based on the natural breaks within the dataset.
Figure 2.3 A map of city populations symbolized using default settings in ArcGIS. This map illustrates the various population sizes of some southern California cities. Map by Steven Steinberg. Data from US Census and State of California.
Although using the defaults in your software may produce a nice map, they may not be appropriate to your study data and objectives. Therefore, it is important to understand and define data categories that make sense for your needs. Perhaps there are legal or regulatory definitions for the sizes of cities you should consider. Or there may be statistical justification for how you examine your data. Changing the categories, of course, changes the map and the analysis results.
The map in figure 2.4 retains the five categories from very small to very large but uses a geometric interval as the basis for the categorization. Notice how the distribution of city sizes appears differently on the map.
Figure 2.4 A map with city sizes symbolized using geometric categorization in ArcGIS. Map by Steven Steinberg. Data from US Census and State of California.
And finally, the map in figure 2.5 uses five quantiles, again changing the appearance and categorization of city sizes. Quantiles are a method of classification by which the data are divided into a specified number of equal-interval categories.
Figure 2.5 A map of city sizes symbolized using five quantiles of geometric categorization in ArcGIS. Map by Steven Steinberg. Data from US Census and State of California.
Although all of these examples are drawn from exactly the same dataset, they each represent the data differently. If you receive data that have been categorized in advance, you may find that the data are difficult or impossible to use in a study with a different set of questions. For example, what may be a medium-sized town to the person creating the original dataset may be a small town in your study. Another simple example of data degradation is the grouping of income levels into categories, which is a common practice in survey research. Categorical information, such as <$15,000 and $15,001–35,000, provide no means for a later study to distinguish individuals with incomes between $20,001 and $30,000. In a mapping context, it could be useful to link people or ideas to specific locations, but more commonly, data are collected by larger geographic regions, such as census blocks or other political boundaries; however, a census block doesn’t show the internal distribution of data in the census block (e.g., are the households equally distributed across the area, or is clustering of the households hidden in the simplified data?).
Where data are provided in categorical form using category definitions that do not meet your requirements, you may need to locate an alternative data source or even collect your own primary data. Data that are degraded can no longer be recategorized to explore new or different questions. Of course, these are not simple issues to address because anonymity is an essential component of many social science questions; however, to the extent possible, when data are maintained in near-original, detailed form, the possibilities for analysis both within and outside the GIS are much greater.