-
In the reading today (Stevens et al.) the authors use a technique to produce a high resolution description of the distribution of human populations across the globe. What is the name of the technique and describe in general and basic terms how it works?
- The authors used a technique called the random forest estimation method. It is a semi-automated dasymetric modeling approach which combines census data with ancillary data to estimate population densities with high resolution. This algorithm evaluates population data clusters to estimate population densities in different areas.
-
The random forest method used by the authors is a machine learning algorithm (ensemble method). In general terms, what is a machine learning algorithm? Within the context of this study what distinguishes a data science, machine learning method (such as random forest) from previous classical statistical approaches to describing and analyzing phenomenon and events?
- Machine learning algorithms are programs which utilise math, specifically statistics and logic, that can make predictions and estimations with data. The machine learning aspect expresses that with more data, the algorithm is able to perform better and more effectively, in other words, it adjusts its variables as more data is inputted. Machine learning algorithm is different than classical statistical approaches as it continuously adjusts, such as Random Forest technique, where classical approaches use formulas and follow other methods such as the Bayesian method.
-
In the reading, the authors use a number of geospatial covariates as predictors in their machine learning method. What were these geospatial covariates and approximately how big of a data set did they represent (in general terms)? What is the significance of big data in the estimation of machine learning methods for inferring the correlates and drivers of human population distributions?
- The authors used geospatial covariates (distance based covariates) to determine the relationship between the covariates and population densities. The big data used in machine learning methods is extremely significant because it directly impacts the accuracy of the predictions, inferences, and estimations of correlates and drivers.
-
- Having highly accurate descriptions of where each person is located is crucial to the design, planning, and implementation of intervention methods for crises. Creating more effective intervention methods by using highly accurate data can quicken human development in regions, especially in LMICs. We can also use this data to measure and monitor population growth and how that impacts our world.
-
Within the context of human development in LMICs, what is the relevance to your area of investigation in having a highly accurate description of where each household and person is located across planet earth?
- My investigation focuses on malaria prevalence in Sub-Saharan Africa. Having highly accurate description of each household and person is crucial to mapping and trapping human networks, which is the number one way diseases are spread. Big data, such as CDR data, can accurately track the exact location of each person, and the time which they are there, and this is useful in mapping human connectivity. Knowing where people are concentrated, and thus where regions are high-risk for the disease, is very useful in the planning and implementation of intervention methods that are focused on eliminating malaria disease.