STATISTICS AND HUMAN RIGHTS

100 YEARS FROM NOW

INTERVIEWER: Suppose that time travel were possible, and you could take one trip. You can only observe, not change anything, when you get there. Would you travel to a time in the past or in the future? JESSICA UTTS, University of California, Irvine professor emerita and past president of the American Statistical Association: I think I would travel about 100 years into the future. I’m very curious about what the status of lots of issues will be by then, including education, global warming, human rights, sources of energy, family composition, the status of nations, and so on.

Enhancing Data Science Ethics Through Statistical Education and Practice

International Statistical Review (2021) doi:10.1111/1751-5823.12446

Jessica Utts, University of California, Irvine, California, USA

Summary

As sources of data become more plentiful and massive datasets are easier to acquire, new ethical issues arise involving data quality and privacy, and the analysis, interpretation and dissemination of data-driven decisions. There are numerous anecdotes involving abuses of complex data analyses and algorithms, and the impact they have had on society. In this paper, we discuss what statisticians can do to help enhance data science ethics in practice and what statistics educators can do to instill sound ethical behavior in our students. We have opportunities to practice and teach ethical conduct relevant to all stages of the data life cycle. This paper discusses issues impacting ethical data science, with a focus on how statisticians can help raise awareness and encourage implementation of ethical best practices.

Introduction

Early in my teaching career, there was a PhD student in one of my classes who had spent time as a journalist and social activist before returning to school. He told me the following memorable story. One day in an engineering class, the professor gave the students an assignment. They were to break into small groups and discuss how to design a pipeline to send blood from a poor developing nation to a rich developed one. The students got to work, discussing the optimal diameter for the pipe, how to go under a body of water, methods for keeping the blood fresh and so forth. After reconvening the class and hearing the ideas from each group, the professor announced that they had all failed the assignment. The startled students demanded to know why. The professor explained that the true reason for the assignment was to see if any one of them would question the ethics of the assigned task. ‘But’ protested a student, ‘this is a class in engineering, not ethics!’
At the time I thought the story was apocryphal, but I found two recent posts containing the story that lead me to believe it was real (Cantor 2020; Steinberg 2018). Whether it is real or not, the moral is clear. We need to train our students and practitioners to ask ‘why’ before asking ‘how’. As statisticians and data scientists, we need to question the ethics of our work. We need to ask who benefits and who might be hurt. We need to consider the ways in which results of our work might be biased or might be presented in a misleading way to consumers of the work. These are some of the many aspects of data science for which those trained as statisticians are the most qualified to make contributions.
The exponential growth in data science in the past decade has led to more opportunities for statisticians, but with opportunity comes responsibility. As a profession, we need to put more emphasis on ethical guidelines and procedures. Many national and international statistical professional organizations, including the International Statistical Institute, the American Statistical Association and the Royal Statistical Society, have professional ethical guidelines or codes of conduct, which are updated every few years. But those guidelines have previously been limited to roles traditionally held by statisticians. In recent years, the jobs our graduates are getting are more complex and extend beyond the boundaries of traditional jobs in statistics. For instance, the job may involve working on teams to develop machine learning and artificial intelligence algorithms. It is almost surely the case that future jobs for statisticians will involve processing larger amounts of data than has been the case in the past and data from sources that are not well understood or vetted.
It is not clear whether data science can be defined or established as a separate profession, with many data science projects carried out as teamwork by professionals with varied backgrounds. As argued by Detlef Steuer (2020) in an opinion piece in Significance magazine, it is difficult to develop a set of professional ethics when there is no defined profession. In a more technical version of Steuer's argument, Garzcarek and Steuer (2019) note that ‘For the individual data scientist, the translation from very general ethical principles from common morality, law or religion, to an ethical issue at work can be quite difficult. Especially since most issues are not about intentions, but about the consequences of one's work’. That insight amplifies the necessity for more guidance on ethics for data science.
In late 2020, the Royal Statistical Society established a Section on Data Ethics and Governance (https://rss.org.uk/membership/rss-groups-and-committees/sections/data-ethics/). The new section grew from a Data Ethics Special Interest Group, which had been established 3 years earlier. The news story announcing the new section explained that part of the motivation was as follows.
“With the introduction of ever more powerful applications of AI and with the development of data science as a discipline comes the inherent danger that we may inadvertently harm or stigmatize individuals, groups, or communities and exacerbate structural inequalities.(https://rss.org.uk/news-publication/news-publications/2021/section-group-reports/data-ethicsspecial-interest-group-becomes-fully-f/)”
In the same news announcement, it was noted that including ‘governance’ as part of the section name ‘reflects the importance we attach both to understanding and to the practical implementation of ethical safeguarding within the work of statisticians and data scientists’. In other words, there is recognition that the ethical implications of data science extend beyond data science itself, to include the consequences of the implementation of data science results in society.
The intent of this paper is to focus on the areas of data science for which statisticians currently have the most expertise. Before concentrating specifically on the role of statisticians, Sections 2 and 3 include a discussion and some examples of the broader issues surrounding data science ethics. Section 4 highlights the areas for which statisticians are most qualified to contribute to ensuring ethical work. Section 5 provides guidance for helping to create a statistically literate society, noting that the essential principles of data literacy are fundamentally the same as longestablished principles of statistical literacy. That endeavour is important, because if consumers of data science results understand the possible ethical issues, they can help ensure that the producers of those results adhere to ethical guidelines. Therefore, it is the responsibility of statistical educators to include literacy and ethics as part of the statistics curriculum at all levels.

The Complex State of Data Science Ethics: The Big Picture

There are numerous books, scholarly articles and stories in popular media giving examples of unethical data science or of algorithms that are biased or harmful to certain groups. In many cases, the bias or harm was not intentional but, rather, was the result of inadequate human intervention and oversight. Two books written for general audiences that are good sources of examples are Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (O'Neil 2016) and Hello World: Being Human in the Age of Algorithms (Fry 2018).
In a research article addressing the topic, Garzcarek and Steuer (2019) provide several illustrations of algorithms that have been shown to be biased. But they note, it is often difficult to explain how the bias arises, because algorithms generally are seen as black boxes. And if there is no obvious source of bias in the data used to create the algorithm, it can be argued that there must not be any bias in the results. An example mentioned by them and elaborated upon by O'Neil (2016) is the use of algorithms to predict recidivism for someone who has been arrested.
One of the earliest such tools was COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), renamed ‘Equivant’ in January 2017. The tool was developed in 1998 by a statistics professor and a correctional facility administrator. According to Dressel and Farid (2018), by 2017, COMPAS had been used to assess more than a million offenders. It relies on answers to over 100 questions about the individual's life and criminal background and predicts the person's risk of reoffending within 2 years of the assessment.
While race is not explicitly used in the algorithm, it apparently does use answers to questions that can be proxies for race. These include things such as at what age the offender first encountered the police and whether the offender has other family members who have been arrested. Addressing the concept to statisticians, Garzcarek and Steuer (2019, p. 4) write, ‘In statistical language they form a prior belief on an individual generated by experience with other individuals assigned to the same group’. We return to the COMPAS example in Section 3.3.
Some common examples of where bias has been shown to enter into algorithmic decision-making include the following general domains:
  • algorithms (such as COMPAS) used by judges to decide who is likely to reoffend;
  • bias in hiring algorithms, based on bias in training data;
  • financial algorithms used to decide who should get loans based on geographic and other aggregate data;
  • medical diagnostic algorithms, trained on data excluding certain subpopulations; and
  • facial recognition software, shown to work best for Caucasian males and to include substantial errors for dark-skinned individuals.
A good source of discussion and analysis of the ethics of artificial intelligence and machine learning can be found through the AI Now Institute at New York University (https:// ainowinstitute.org/), self-described as ‘A research institute examining the social implications of artificial intelligence’. From 2016 to 2019, they published an annual report, including recommendations for researchers, corporations, governments and policymakers. (They opted not to publish an annual report in 2020, focusing instead on specific projects.) For example, here are three of the recommendations from their 100-page 2019 annual report that could have implications for data scientists (Crawford et al. 2019).
  • Recommendation 4: AI bias research should move beyond technical fixes to address the broader politics and consequences of AI's use.
  • Recommendation 8: States should craft expanded biometric privacy laws that regulate both public and private actors.
  • Recommendation 11: Machine learning researchers should account for potential risks and harms and better document the origins of their models and data.

Some Important Ethical Guidelines for Data Scientists

This section provides some guidance on broad-based issues related to data science ethics. The ideas in this section should be kept in mind by data scientists, especially when working as part of a multidisciplinary team. Often it is the statistician or data scientist who has the least vested interest in the outcome of a project and the best perspective about what ethical issues might arise.
Of course statisticians have been working as collaborators on multidisciplinary teams for decades. As noted by Gibson (2019, p. 109), ‘Statistical leadership is the use of influence to guide a multidisciplinary team to adopt the best design or decision based on the available data’. But as projects become more complex, the role of the collaborating statistician needs to expand. Gibson (2019, p. 113) notes, ‘The rapid growth of big data creates new opportunities for statisticians to collaborate on issues related to minimizing bias, false discovery, and generalizability of results from data that is not sample and may represent almost the entire population’. He offers examples from his experience in the pharmaceutical industry and gives recommendations for how statisticians can provide leadership even when they are not in official leadership positions.
The topics in this section can be discussed with teammates to make sure everyone is focused on the ethical implications of the team's work. Statisticians and data scientists can and should play a leadership role in these discussions.

3.1 Team Responsibility for Discussions of Ethics

Everyone on a multidisciplinary team should take responsibility for ethical issues. It is important to share ideas and concerns early in the development of a project. As noted by the example of the blood pipeline at the beginning of this paper, it is important to ask ‘why’ before asking ‘how’ and to consider the ethical implications of the research or product.
For example, suppose you are part of a team that's developing a GPS map program that will inform drivers in real time about what route is best. It would be very important to define what's meant by ‘best’. Here are some questions that might be considered. If a driver is about to leave a major sports event with thousands of other drivers, should the algorithm divide up the traffic so that not everyone is taking the same route? Is it ethical to send drivers on routes through high-crime neighborhoods? Is it ethical to identify high-crime neighborhoods, thus perhaps labelling them with an unfair stigma? Is it ethical to send drivers through residential neighborhoods, creating traffic issues for those neighborhoods? What about school zones when children are leaving school? And what if the GPS program is used for pedestrians or bicyclists. Is it ethical to recommend that they walk or bike through high-crime areas, or areas known to be unsafe for them because of dangerous car traffic?
None of the questions given in the previous paragraph are technical. They are ethical considerations that should be addressed before developing the algorithms, and the answers should focus on ethics rather than on profit.

3.2 Transparency and Black Box Mysticism

There is a mysticism surrounding computers that may lead to more credibility than is warranted for results of machine learning algorithms. The ‘black box’ allure sometimes results in users giving more credibility to recommendations or results determined by an algorithm than to those determined by a human, because computers are thought to be infallible. In general, users do not understand the concepts of statistical variability and bias that almost always accompany training data. The problem is exacerbated by the fact that many algorithms are proprietary, so users are not provided with information on what data were used to develop them.
For these reasons, it is important to provide as much information as possible about what information was used to develop and test an algorithm and to explain that there is likely to be statistical uncertainty associated with the results. This recommendation is of course what always is (or should be) taught in statistics education, namely, know and report exactly how data were collected in order to interpret analyses and investigations, and to allow others to build on your work.

3.3 The Last Step in Any Algorithmic-based Decision Should Be a Human Expert

Algorithms are developed to make decisions that work on average, not necessarily for every individual. Examples abound on how algorithms have resulted in unfair and possibly dangerous decisions for individuals. The books by Cathy O'Neil (2016) and Hannah Fry (2018) mentioned earlier provide many examples. As algorithm-based decisions become commonplace, it is crucial that every decision be reviewed by a human.
A related issue is the question of whether an algorithm works better than human judgement. Surprisingly, that question often is not answered before an algorithm gains widespread use. An interesting example that has been the center of extended controversy is the COMPAS algorithm mentioned earlier, which has been used to predict recidivism for over a million criminal offenders. Dressel and Farid (2018) decided to test whether the algorithm would do better than either individual experts or the collective decision of non-experts. Their results are summarized as follows:
“We show that the widely used commercial risk assessment software COMPAS is no more accurate or fair than predictions made [collectively] by people with little or no criminal justice expertise. In addition, despite COMPAS's collection of 137 features, the same accuracy can be achieved with a simple linear predictor with only two features. (Dressel & Farid 2018)”
In an article in the Atlantic, Yong (2018) summarized that study as well as a study by Propublica (Angwin et al. 2016; Larson et al. 2016) alleging racial bias in COMPAS. Yong also presents the rebuttal argument from the company that now owns the algorithm as well as references to other academic analyses. The point of this example is not to single out COMPAS but, rather, to illustrate that humans should not be left out of the loop when using algorithms to make decisions.

3.4 Algorithms May Provide Information Useful for Making Positive Changes

As all students of statistics learn early in their education, correlation does not imply causation. It is important for data scientists to think about whether associations seen in data analysis and machine learning results might provide information useful for making positive reforms. For instance, suppose a judge must decide whether to release an accused offender on bail and uses an algorithm to predict whether the person will show up for their court hearing. Further, suppose an analysis shows that women who have small children are less likely to show up for their hearing than are other women of the same age. Should that information be used to argue in favour of denying bail to women with small children? It seems much more humane and logical to surmise that they may not show up because they have no one to care for their children.
That logic could lead to a program to provide child care for women when they are required to be in court.

3.5 Trade-off Between the Good of Society and the Rights of the Individual

Sometimes decisions in data science ethics require a trade-off between benefiting society and protecting individual rights. For example, there has been ongoing debate on the ethics of allowing researchers to have access to health records in the UK National Health Service, even when individual records cannot be identified (Cheung 2020; Ford et al. 2019).
A more straightforward example occurred in the spring of 2018 when DNA from a publicly available database was used to find and arrest a man who had committed a string of rapes and murders 40 years earlier. A website called GEDMatch.com had been established for people to voluntarily register their DNA collected for genealogical searches. (GED is short for Geneological Data. A GEDCom file is a generic format, shorthand for ‘genealogical data communication’.) The intention of the website founder was to provide a resource for people doing genealogy as a hobby to find close and distant relatives. But Sacramento (California) law enforcement investigators used the site to find a man who had committed multiple rapes and murders decades earlier, using DNA they had retained from the crimes. They matched the perpetrator's DNA to a distant relative on the GEDMatch website and then figured out who the perpetrator was by using other genealogical tools available to anyone.
There were no laws or policies preventing law enforcement use of the GEDMatch database for this purpose, and they were not granted any special access. A few months later, GEDMatch gave users the option of whether to allow their DNA data to be used for criminal investigations. Each user was asked to opt in if they wanted to allow this; otherwise, their DNA could not be used for criminal investigations. Clearly, such use would benefit society, and it is essential for users to understand the consequences of their decision and hopefully allow possible use of their DNA by law enforcement. The email from GEDMatch that explained the new opt-in policy strongly encouraged users to do so.

3.6 A Recommendation for Programs in Statistics and Data Science

The principles listed in this section should be part of all aspects of statistics and data science degree programs, especially for graduate programs, whose students are likely to hold positions of greater responsibility than those from undergraduate programs. Although some programs may have separate courses on ethics, it is important to integrate ethics throughout the curriculum. One method for accomplishing this goal is to add a discussion of ethics to all assignments for which that makes sense. For instance, an assignment to conduct and summarize a data analysis could include the requirement that the report contains a discussion of ethical implications. With that kind of training, students may remember to ask and answer questions about ‘why’ before answering questions of ‘how’.

How Statisticians Can Contribute to Ethical Data Science in Practice

Statisticians traditionally have been trained in areas that can help foster ethical data science. When working as part of a research team, it is important for statisticians to speak up about violations of good practice, even if those violations are unintentional on the part of other team members. In this section, we discuss four aspects of research that statisticians are well versed to address and then present a few examples to illustrate them. These four aspects include general data issues, planning a study, the analysis phase and ethical reporting of results.

4.1 Ethical Consideration of Data Issues

In clinical trials and other designed experiments, standards such as informed consent and data privacy are well established. But this is not necessarily the case in observational studies, especially when data sources include web scraping, purchasing data or harvesting data collected by one's employer. Similarly, concepts such as random sampling and random assignment are not likely to be part of most observational studies. Convenience samples are often used, and explanatory variables are likely to occur naturally and thus be confounded with other variables. Here is a list of some of the ethical data-related issues statisticians should address when working with a research team:
Is informed consent possible, and if so, is it used?
Is anonymity guaranteed, especially if the dataset will be merged with other datasets in ways that may create subsets with one or only a few members?
Are there structural biases built into the data, such as when zip codes or post codes are used as a proxy for ethnicity?
Are certain subgroups disadvantaged, for instance, because of past discriminatory behaviours?
In designed studies, are dropouts related to the research questions or treatments?

4.2 Ethical Considerations when Planning a Study

Many decisions need to be made when planning a study, even if it is not a designed experiment. While some of these decisions may not seem to be related to ethics, any decision that undermines the validity of the results should be considered as an ethical issue. And in many cases, the ways in which results can be undermined would not be obvious to anyone not familiar with all of the details of implementation. The following issues and examples illustrate how this can happen.

4.2.1 Ensuring ecological validity

Ecological validity refers to the idea that the way the study is conducted should mimic the real world, so that the results of the study would hold in situ. For example, if a study to test the efficacy of nicotine patches to help people quit smoking included additional encouragement to the participants that would not be available to patch users in the future, the effect of the encouragement would be confounded with that of the nicotine patch. That would reduce the ecological validity of the study.
I was once a consultant on an experiment to test whether using ‘time of use’ electric rates would encourage agricultural customers to reduce their electricity usage during peak hours. There were two electric companies involved in the study. They each had a representative whose job was to meet with the customers and explain the program to them. Unfortunately, one of the representatives thought of the study as a competition to show that his company would get better results than the other company. So he worked very hard to motivate the customers he met with, encouraging them to reduce their electric use during peak hours. Because such encouragement would not be part of the eventual implementation of the new rates, it interfered with the ecological validity of the study.

4.2.2 Ethics of intervention without consent

In a study jointly conducted by a PhD student at Cornell and a Facebook employee in 2012, almost 700 000 Facebook users were randomly assigned to four treatment groups without their explicit consent. One group had their negative news feed reduced, while another had their positive news feed reduced. Control groups had news feed randomly omitted. The study lasted for 1 week. The outcome measure was the frequency of positive and negative words in the users' own posts. The hypothesis was that seeing less positive news would result in greater use of negative words in users' posts and seeing less negative news would result in increased use of positive words in those posts.
The results were published in the Proceedings of the National Academy of Sciences (Kramer et al. 2014) and received widespread media attention. According to Altmetric data, the study was mentioned by 337 news outlets, putting it into the top 5% of all ‘research outputs’ scored by them at that time. The results will be covered as Example 1 in Section 4.5. The point here is that almost 700,000 people were participants in an experiment with the potential to cause them psychological harm without their explicit consent. In an editorial, the journal defended publication of the work because it did not violate Facebook's privacy policy (Verna 2014).

4.2.3 Ethics of conducting a power analysis before (or after) data collection

It is a waste of resources to conduct a study that is underpowered. It is the ethical responsibility of statisticians to raise the question of power whenever they are involved in research before data collection. The problem is compounded by the fact that most users of statistical studies do not understand the difference between failing to reject the null hypothesis and concluding that the null hypothesis is true. Therefore, a study with low power may lead to the conclusion that there is no effect or difference, when in fact the sample size was not large enough to detect an existing effect.
If a study has been conducted before the statistician becomes involved, it is the responsibility of the statistician to raise the question of whether it was sufficiently powered. This is of course especially important if the conclusion of the study is that there is no significant effect or difference.
As an example, Hauer (2004) explains how legislators in the USA were misled into thinking that it was safe to allow drivers to turn right at a red light. The problem started in 1976 when a consultant for the state of Virginia did a small study at 20 intersections and found no statistically significant difference in accident rates with and without allowing right turn on red. A Virginia government official misinterpreted the consultant's report and sent a letter to the governor that read ‘No significant increase in traffic crashes has been noted following adoption of right turn-on-red’. Of course the consultant's report meant that no statistically significant increase was found. In fact, the consultant found that there were almost one and a half times as many personal injury accidents when right turn on red was allowed than when it was not. But the study was too small for that difference to be statistically significant. The government official did not understand the difference between the technical and ordinary uses of the word significant. It was not until many states had passed laws allowing right turn on red that a study was done in 1981 with enough power to discover that this practice was indeed dangerous. For more details, see Utts and Heckard (2022, pp. 539–540).

4.3 Ethical Considerations in the Analysis Phase of a Study

In recent years, there has been extensive discussion of the concept of p-hacking, especially in the psychological sciences research community. The practice of fishing for something of significance in the results of a study has been known to statisticians (and discouraged) for many years but has come under enhanced scrutiny lately. That is just one example of the ethical problems that may occur in the analysis phase of a study.

4.3.1 Questionable research practices

A survey of academic psychologists described in Psychological Science in 2012 found that a very large percentage of respondents had participated in one or more ‘questionable research practices’ (John et al. 2012). These included actions like stopping data collection early when a significant result had been found, reporting only findings that were statistically significant, deciding whether to exclude some data after looking at how it affected results and numerous other practices. The purpose of most of these practices was to create results that could be published as statistically significant.
Although the 2012 survey included only psychologists, the practices reported are commonplace across the sciences. An extreme case of questionable research practices led to the resignation of a prominent food scientist when it was discovered that he was training his colleagues and students to keep looking until they found a way to get statistical significance. He apparently did not think he was doing anything wrong (Lee 2018). The investigation of his work resulted in the retraction of over a dozen papers, including six in JAMA, as well as published corrections for many other papers. But most cases are not so obvious and do not lead to retractions and resignations. In fact, the survey by John et al. (2012) found that many respondents did not think the questionable research practices described were wrong. It is incumbent on statisticians to make sure analyses are conducted according to sound statistical principles and reported fully and honestly.

4.3.2 Using inappropriate Bayesian priors

In creating and applying Bayes factors for hypothesis tests, there are two sets of priors that come into the analysis. Most people who have any familiarity with Bayesian methods are aware that they include prior probabilities for the null and alternative hypotheses. But what is often hidden in the analysis is the prior distribution placed on values in the alternative hypothesis, which can lead to erroneous conclusions. For example, in testing something that has a small effect size, suppose the prior distribution used in the alternative hypothesis covers mostly large effect sizes. Then loosely described, the use of a Bayes factor to determine whether the data are more consistent with the null hypothesis (a zero effect size) or the alternative hypothesis (large effect sizes) will almost surely result in the null hypothesis winning. Even using a non-informative prior, which spreads the effect across the range of the alternative hypothesis, is unethical if it is known in advance that if an effect exists, it will be a small one. For an example of this problem applied to testing for the existence of extrasensory perception, see Wagenmakers et al. (2011) and the rebuttal by Bem et al. (2011).

4.3.3 Some other data analysis issues with ethical implications

Many users of statistics do not understand the concept of multicollinearity and may need the guidance of a statistician to correctly interpret coefficients in multiple regression. It is common in medical and other subject-matter literature to see a list of regression coefficients in a table with p-values and stars indicating which ones are statistically significant. Often an attempt is made in the text to explain why these variables are significant or to explain why others are not. Multicollinearity can even cause coefficients to have the opposite sign in multiple regression than they would have in a simple linear regression with only a single explanatory variable. Coefficients in multiple regression should never be interpreted without investigating the individual relationships between each explanatory variable and the response.
Statisticians who teach multiple regression should emphasise the extent to which multicollinearity can lead to misleading results. Statistics educators at all levels should introduce examples that illustrate how interrelated variables can lead to erroneous conclusions. With the increasingly strong recommendations that introductory and school statistics should include many-variabled contexts and datasets, more early consideration needs to be given to these concepts. For example, the updated Guidelines for Assessment and Instruction in Statistics Education reports for both PreK-12 (Bargagliotti et al. 2020) and College (Carver et al. 2016) strongly emphasise the need for multivariable thinking at all levels of the curriculum.
Another issue that many users ignore is when missing data are not missing at random. For example, in a clinical trial to compare a drug and a placebo, those assigned to the drug may drop out in greater numbers due to side effects that are not present with the placebo. Or placebo users may drop out because they are not getting relief of their symptoms and opt to go for treatments that are known to work. An example would be a clinical trial to compare nicotine and placebo patches for smoking cessation. If the nicotine patches work to reduce craving but the placebo patches do not, placebo participants may drop out to seek other smoking cessation treatments. These considerations should always be taken into account in data analyses.

4.4 Ethical Reporting of Results

Even if a study is done meticulously with careful attention to ethics, reporting the results to clients, in journals, and to the media can be problematic. The following are some guidelines for ethical reporting of results. Focus on effect magnitude, not p-values. With big data, small effects have tiny p-values. Include a clear explanation of uncertainty. Do not overstate the importance of results. Graphics should be clear and not misleading. Media coverage should include all relevant results,  not just the most interesting or surprising. Do not imply causal connections that are not justified.
As a hypothetical example, suppose you are a consultant for a client who asks you to evaluate an online game with activities designed to boost children's math skills. The game has activities designed to boost language skills as well. The data provided include pre and post math and language scores, along with the amount of time each child devoted to the two types of activities. Now suppose you find that math scores went up on average, but that language scores went down, and you also notice that the game appeared to be somewhat addictive. You were asked only to evaluate whether the math scores went up. Are you ethically bound to report the negative consequences to the client as well? What about to the media, if you are asked to comment on your analysis?

4.5 Examples Illustrating Ethical Problems

In this section, two examples are presented that illustrate a combination of the ethical issues discussed so far.

Example 1. Facebook emotion study

A 2012 Cornell University press release was headlined ‘News Feed: Emotional contagion sweeps Facebook’. As introduced in Section 4.2.2, the headline was based on a study of 689 003 Facebook users who were randomly assigned without informed consent to have their negative news feed or their positive news feed reduced or to be part of a control group with news feed randomly omitted. The press release stated ‘People who had positive content experimentally reduced on their Facebook news feed for one week used more negative words in their status. When news feed negativity was reduced the opposite pattern occurred. Significantly more positive words were used in peoples' status updates’ (Segelken & Shackford 2014; emphasis added).
The results were published in the Proceedings of the National Academy of Sciences (Kramer et al. 2014) and received extensive media attention. One would think that the effect was at least moderate, given the publicity and headlines. However, here are the reported results from the published paper:
“When positive posts were reduced in the News Feed, the percentage of positive words in people's status updates decreased by B D0.1% compared with control [t(310,044) D5.63, P < 0.001, Cohen's d D 0.02], whereas the percentage of words that were negative increased by B D 0.04% (t D 2.71, P D 0.007, d D 0.001). Conversely, when negative posts were reduced, the percent of words that were negative decreased by B D0.07% [t(310,541) D5.51, P < 0.001, d D 0.02] and the percentage of words that were positive, conversely, increased by B D 0.06% (t D 2.19, P < 0.003, d D 0.008).”
As can be seen, the effect sizes were extremely small. But because the sample size was so large, the p-values were also very small. Graphs provided in the article used a y-axis scale designed to enhance the appearance of an effect. When addressing the small effect sizes in their paper, the authors defended the importance of their work by stating ‘And after all, an effect size of d D 0.001 at Facebook's scale is not negligible: In early 2013, this would have corresponded to hundreds of thousands of emotion expressions in status updates per day’ (Kramer et al. 2014).
The list of ethical issues in this example include no informed consent; misleading graphs; confusion of statistical significance with practical importance; and justification of small effect size as being of practical importance because of the large population affected.
Example 2. Were millions of women misled about hormone replacement therapy?
In July 2002, a large randomized clinical trial to assess the effects of hormone replacement therapy (HRT) in post-menopausal women was stopped early because an interim analysis showed increased risk of coronary heart disease and breast cancer in the women taking hormones. Although it had been widely believed that HRT increased the risk of breast cancer, the surprising result of the study was that it also increased the risk of heart disease, whereas earlier observational studies had shown HRT to be heart protective. The increased risk of heart disease received widespread media attention.
However, what did not receive widespread attention was that HRT also appeared to reduce other risks. The original article reported the results as follows. ‘Absolute excess risks per 10,000 person-years attributable to estrogen plus progestin were 7 more CHD [coronary heart disease] events, 8 more strokes, 8 more PEs [pulmonary embolisms], 8 more invasive breast cancers, while absolute risk reductions per 10,000 person-years were 6 fewer colorectal cancers and 5 fewer hip fractures (Writing Group for the Women's Health Initiative Investigators 2002).
In fact, overall, 231 out of 8 506 women taking the hormones died of any cause during the study, which is 2.72%. Of the 8 102 women taking the placebo, 218, or 2.69%, died, a result virtually identical to that in the hormone group. When the results are adjusted for the time spent in the study, the death rate was slightly lower in the hormone group, with an annualised rate of 0.52% compared with 0.53% in the placebo group.
The ethical problem here is that the media and medical community focused on the surprising heart disease results and did not widely publicise the ways in which HRT appeared to reduce risk. They did not mention that the HRT group fared better in many ways, including in the adjusted death rate. If full results had been reported in the media, women could decide for themselves, for instance, based on family or personal medical history. Instead, millions of women were advised to immediately stop taking hormones, a decision that caused widespread discomfort and possibly increased death rates for women with a history of colon cancer or osteoporosis and hip fractures.

Promoting Statistical Literacy

The issues raised in the previous sections should make it clear that statistics is at the heart of ethical data science. The AI and algorithm problems discussed in Sections 2 and 3 need statistical thinking to solve. The specific topics and examples covered in Section 4 depend on understanding basic statistical principles and the implications of violating them. Therefore, now more than ever statistical education at all levels should include a dialogue about ethics along with discussions of statistical ideas and methods.
Additionally, statistics educators have an ethical responsibility to promote statistical literacy at all stages of statistical education. A well-informed citizenry will be able to identify ethical breaches in the results of statistical studies and make better decisions in their work and personal lives. In 2003, I published a list of seven topics I thought were important for an educated populace (Utts 2003), updated it for the 2010 International Conference on Teaching Statistics (Utts 2010) and have been updating the list ever since. It has grown to a list of 10 topics, three of which will be discussed in detail in the following sections. Here are the 10 topics, in no particular order:
  1. Observational studies, confounding and causation
  2. The problem of multiple testing
  3. Sample size and statistical significance
  4. Poor intuition about probability and risk
  5. Why many studies fail to replicate
  6. Does decreasing risk actually increase risk?
  7. Personalised risk versus average risk
  8. Using expected values to make decisions
  9. Surveys and polls—good and not so good
  10. Confirmation bias and selective consumption of news
Topic 3 is illustrated by two previous examples in this paper. The problem of statistical significance for a large sample with a very small effect is shown in Example 1 (Facebook emotion study), and the problem of a non-significant result from a small sample with a large effect is shown in the right turn on red example discussed in Section 4.2.3. Topics 6 and 7 are partially illustrated in Example 2, in which certain risks of HRT received widespread attention while others did not, which may have led to increased risk for some women who stopped taking hormones.
In the following sections, discussion and examples will be covered for Topics 1, 4 and 8. Additional discussion and examples for most of these topics can be found in Utts (2003), Utts (2010), Utts (2015) and Utts and Heckard (2022).

5.1 Observational Studies, Confounding and Causation (Topic 1)

One of the most common mistakes made in the media is to attribute cause and effect relationships when they are not warranted. The following example illustrates this problem.

Example 3. Breakfast cereal and obesity

A Reuters News story in 2013 was headlined ‘Breakfast cereal tied to lower BMI for kids’ (Doyle 2013) and went on to explain ‘Regularly eating cereal for breakfast is tied to healthy weight for kids, according to a new study that endorses making breakfast cereal accessible to low-income kids to help fight childhood obesity’. Another news story reporting on the same study was even more explicit in attributing a causal connection between eating cereal and weight, with the headline ‘Breakfast cereals prevent overweight in children’. (Source: http:// worldhealthme.blogspot.com/2013/04/breakfast-cereals-prevent-overweight-in.html.) The stories discussed an observational study that was part of a larger study on diabetes (Frantzen et al. 2013). Although data were collected on 1 024 children for the cereal results, only 411 had usable data. They were mostly low-income Hispanic children in the US state of Texas. The children were asked what foods they ate for 3 days in each of 3 years, and the number of days they ate cereal was used as one of the explanatory variables, ranging from 0 to 3 days for each year. The response variable was the child's body mass index (BMI) percentile in each of the years. Multiple regression was used, and although the explanatory variable of days eating cereal ranged only from 0 to 3, it was modelled with a linear relationship to BMI percentile. Age, sex, ethnicity and some other nutritional variables were included as well.
There are various problems in the analysis and reporting of this study. The second headline described earlier unjustifiably suggests a causal relationship, with cereal consumption resulting in lower BMI. The study did not differentiate between other breakfast foods and no breakfast, so it is possible that children who did not eat cereal ate something unhealthy for breakfast or that they ate no breakfast at all. A possible confounding variable is that children who had an unhealthy breakfast or no breakfast also had unhealthy eating habits at other meals and thus were more likely to have high BMI. And it is possible that there was a causal relationship in the other direction. People with high metabolism need to eat more often and also are likely to have lower BMI. Perhaps the children with slower metabolism did not eat breakfast, as is anecdotally the case for many adults with slow metabolism. A final possible ethical issue with the study was that the lead author was the vice president of a regional dairy council. Although funding for the study was not provided by the council, it is likely that anything that boosts cereal sales would also be likely to boost dairy sales, and as the Reuters story notes, the study ‘endorses making breakfast cereal accessible to low-income kids to help fight childhood obesity’.

5.2 Poor Intuition About Probability and Risk (Topic 4)

In the 1800s, psychologist William James noted that humans have an intuitive mind and an analytical mind and that these two parts of our minds process information differently. In his book Thinking, Fast and Slow (2011), Nobel laureate economist Daniel Kahneman expands on this idea, defining System 1 [fast, intuitive, storytelling] and System 2 [slow, analytical, logical] thinking. He provides many examples of how we have poor intuitive understanding of statistics and probability because our first instinct is to use System 1 thinking. I discussed many aspects of this in my 2016 American Statistical Association presidential address (Utts 2016). A common example is that people feel safer driving than flying, presumably because they think they have more control, even though accident statistics strongly support the opposite safety conclusion. Psychologists have studied and named numerous instances of fallacious thinking affecting our intuition about probability and statistics. The following example describes one of them.

Example 4. Assessing threats and the conjunction fallacy

This example is similar to scenarios that have been studied extensively to illustrate the ‘conjunction fallacy’, which is aided by the ‘representativeness heuristic’. Choose which of the following you think is more likely: A massive flood somewhere in North America in the next year, in which more than 1 000 people drown. An earthquake in California sometime in the next year, causing a dam to burst resulting in a flood in which more than 1 000 people drown.
Most people choose the second scenario as more likely, but that cannot be true because it is a subset of the first scenario. This illustrates the conjunction fallacy, which is that people erroneously assess P(A and B) to be more likely than either P(A) or P(B) alone. As Kahneman notes, people substitute plausibility for probability. The second scenario presents a plausible scenario, so it invokes the ‘representativeness heuristic’ in which people place higher likelihood on scenarios that represent how they imagine the world works. Lawyers use this phenomenon by describing in exquisite detail how a crime might have happened to fit the evidence, hoping that the jury members will picture it that way and assign high plausibility to that scenario.
Another example of poor intuition about probability is to confuse P(A j B) with P(B j A). A common example is when patients confuse P(disease j positive test) with P(positive test j disease). For low prevalence diseases, this confusion causes undue stress, especially when patients are told that the tests are highly accurate as reflected by the sensitivity and specificity. Medical practitioners often make this error as well, reinforcing the undue stress. A similar example is the ‘prosecutor's fallacy’, in which P(guilt j evidence) is confused with P(evidence j guilt). I have been told by a lawyer that law students are instructed to use this to their advantage in the courtroom.

5.3 Using Expected Values to Make Decisions (Topic 8)

There is no doubt that consumers spend more money than they should because they fail to understand the concept of expected value. For instance, extended warranties are priced to benefit the seller. But some consumers will come out ahead by buying such warranties. A statistically literate consumer would understand the difference between the average risk and payout and their own individual risk and payout. Someone who drives very little, uses only sparsely populated roadways, parks their car in a garage and lives in a low-crime area probably is not going to come out ahead by buying an extended warranty on their vehicle. Statistics students should be taught this concept so they can make intelligent decisions about lotteries, insurance, extended warranties and so on.
Example 5. Should you pay for a hotel room in advance?
Many hotels offer a reduced rate for paying in advance, but it is non-refundable even if the room is not used. Expected value can be used to determine whether a consumer should typically pay that rate or wait until the room is actually used and pay the higher rate. Suppose you plan to take an overnight trip but will cancel if the weather is bad. You are offered a non-refundable rate of $170 or a rate of $200 that will be paid only if you go on the trip. What should you do?
Suppose p is the probability that you will go on the trip. Then the expected value for your cost is E .Cost/ D $170 for the nonrefundable rate; whether you go or not: E .Cost/ D $200  p C $0  .1  p/ D $200p for the refundable rate:
Which one is lower? E(Cost) is lower for the refundable rate if $200p < $170 or p < (170/200) D 0.85. Therefore, if the probability that you will go on the trip is at least 0.85, you are better off with the non-refundable rate, on average. This simple example can be used to illustrate to students the concept of using expected values to make decisions.

Conclusions

The purpose of this paper is to give some guidance to statisticians and data scientists on methods they can incorporate into their work to enhance ethical practice of our profession. It also emphasises that statistical thinking is at the heart of ethical data science. The importance of ethics in data science is gaining prominence within professional communities, but also with the media and the public. Statisticians have both an opportunity and a responsibility to play a major role in ensuring that the future of data science includes a major emphasis on ethics.
Discussions of ethics should be incorporated throughout the education of all students, whether they are going to work as data scientists or not. When statistical ideas and principles are discussed, the implications of misunderstanding or misusing them should be discussed as well. Here are some conclusions and recommendations for data science practitioners and educators:
Statistical ideas are at the heart of ethical data science. Understanding variability, bias and other statistical principles should inform the development and application of data-based algorithms.
Statisticians have a major role to play in implementing ethical data science. Ethical statisticians usually do not (and should not) have a vested interest in the outcome of a study or data analysis, unlike some other members of a team.
Statisticians need to speak up as members of multidisciplinary teams and sometimes take a leadership role in raising issues of ethics. Ask the whole team to consider why a project is being initiated before discussing how to do it. Consider the ethical pros and cons.
Ethical considerations enter into all phases of the data inquiry cycle, including study planning, data collection, data analysis and reporting results.
Because statistical ideas and data science ethics are intertwined, statistics educators need to incorporate discussions of ethics alongside technical issues. One practical idea is to include a requirement to discuss ethics in all relevant assignments and reports, including dissertations.
We need to teach all students how to identify ethical issues and mistakes in reports based on statistical studies, even if they take only one statistics course.
As a practicing statistician or educator, you have the power to make a difference in the ethics of how our science is applied and interpreted. Circling back to the beginning of this article, never allow the equivalent of a blood pipeline to be built without questioning the ethical implications and acting accordingly. And teach your students never to do so as well.

Acknowledgements

I would like to thank Professor Helen MacGillivray for the invitation to give the International Statistical Institute President's Invited Keynote Lecture at the World Statistics Congress in 2019, upon which this paper is based, and for helpful feedback that substantially improved this paper.

References

Angwin, J., Larson, J., Mattu, S. & Kirchner, L. (2016). Machine bias. https://www.propublica.org/article/ machine-bias-risk-assessments-in-criminal-sentencing, published May 23, 2016; accessed January 23, 2021.
Bargagliotti, A., Franklin, C., Arnold, P., Gould, R., Johnson, S., Perez, L. & Spangler, D. (2020). Pre-K-12 Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report II. American Statistical Association and National Council of Teachers of Mathematics.
Bem, D.J., Utts, J. & Johnson, W. O. (2011). Must psychologists change the way they analyze their data?, J. Pers. Soc. Psychol. 1014, 716–719.
Cantor, P. (2020). Monstrous messages. https://www.counterpunch.org/2020/11/02/monstrous-messages/, Nov 2, 2020, accessed January 23, 2021.
Carver, R., Everson, M., Gabrosek, J., Horton, N., Lock, R., Mocko, M., Rossman, A., Holmes Rowell, G., Velleman, P., Witmer, J. & Wood, B. (2016). Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report 2016. American Statistical Association. https://www.amstat.org/asa/education/ Guidelines-for-Assessment-and-Instruction-in-Statistics-Education-Reports.aspx
Cheung, S. (2020). Disambiguating the benefits and risks from public health data in the digital economy. Big Data Soc., 7(1), 2053951720933924.
Crawford, K., Dobbe, R., Dryer, T., Fried, G., Green, B., Kaziunas, E., Kak, A., Mathur, V., McElroy, E., Nill Sánchez, A., Raji, D., Lisi Rankin, J., Richardson, R., Schultz, J., Myers West, S. & Whittaker, M. (2019). AI Now 2019 Report. New York: AI Now Institute. https://ainowinstitute.org/AI_Now_2019_Report.html
Doyle, K. (2013). Breakfast cereal tied to lower BMI for kids. Reuters News. https://www.reuters.com/article/ us-health-breakfast/breakfast-cereal-tied-to-lower-bmi-for-kids-idINBRE93815320130409, published April 9, 2013, accessed January 23, 2021.
Dressel, J. & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Sci. Adv., 4(1), eaao5580.
Ford, E., Boyd, A., Bowles, J. K., Havard, A., Aldridge, R. W., Curcin, V. & Sperrin, M. (2019). Our data, our society, our health: A vision for inclusive and transparent health data science in the United Kingdom and beyond. Learn Health Syst, 3(3), e10191.
Frantzen, L. B., Treviño, R. P., Echon, R. M., Garcia-Dominic, O. & DiMarco, N. (2013). Association between frequency of ready-to-eat cereal consumption, nutrient intakes, and body mass index in fourth-to sixth-grade low-income minority children. J. Acad. Nutr. Diet., 113(4), 511–519.
Fry, H. (2018). Hello World: Being Human in the Age of Algorithms. WW Norton & Company.
Garzcarek U. & Steuer D. (2019). Approaching ethical guidelines for data scientists. In: Bauer N., Ickstadt K., Lübke K., Szepannek G., Trautmann H., Vichi M. (eds) Applications in Statistical Computing. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-25147-5_10
Gibson, Eric W. (2019). Leadership in Statistics: Increasing our value and visibility. Am Statist, 73(2), 109–116. https://doi.org/10.1080/00031305.2017.1336484
Hauer, E. (2004). The harm done by tests of significance, Accid. Anal. Prev., 36(3), 495–500.
John, L. K., Loewenstein, G. & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci., 23(5), 524–532.
Kahneman, D. (2011). Thinking, Fast and Slow. Macmillan.
Kramer, A. D., Guillory, J. E. & Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl. Acad. Sci., 111(24), 8788–8790.
Larson, J., Mattu, S., Kirchner, L. & Angwin, J. (2016). How we analyzed the COMPAS recidivism algorithm. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing, published May 23, 2016; accessed January 23, 2021.
Lee, S. M. (2018). Here's how Cornell scientist Brian Wansink turned shoddy data into viral studies about how we eat. https://www.buzzfeednews.com/article/stephaniemlee/brian-wansink-cornell-p-hacking, Published Feb 25, 2018, accessed January 23, 2021.
O'Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown.
Segelken, H. R. & Shackford, S. (2014). News Feed: ‘Emotional contagion’ sweeps Facebook. https://news.cornell. edu/stories/2014/06/news-feed-emotional-contagion-sweeps-facebook, published June 10, 2014; accessed January 23, 2021.
Steinberg, L. (2018). The blood pipeline: What can urbanists learn from one smart professor?. https://www. humankind.city/post/the-blood-pipeline-what-can-urbanists-learn-from-one-smart-professor, published Aug 2, 2018; accessed January 23, 2021.
Steuer, D. (2020). Time for data science to professionalise. Significance, 17(4), 44–45.
Utts, J. (2003). What educated citizens should know about statistics and probability. Am Statist, 57(2), 74–79.
Utts, J. (2010). Unintentional lies in the media: Don't blame journalists for what we don't teach. Invited paper for the international conference on teaching statistics. 2010. https://iase-web.org/documents/papers/icots8/ICOTS8_1G2_ UTTS.pdf?1402524969
Utts, J. (2015). Seeing Through Statistics, 4th edition, Stamford, CT: Cengage Learning; ISBN-13: 978–1285050881 Utts, J. (2016). Appreciating statistics, J. Am. Stat. Assoc., 111(516), 1373–1380.
Utts, J. & Heckard, R.F. (2022). Mind on Statistics, 6th edition, Boston, MA: Brooks-Cole/Cengage Learning; ISBN13: 978–133779305.
Verna, I. M. (2014). Editorial expression of concern. PNAS Early Ed., 111(29), 10779.
Wagenmakers, E.M., Wetzels, R.B., Borsboom, D. & van der Maas, H.L. (2011). Why psychologists must change the way they analyze their data: The case of psi: comment on Bem (2011). J. Pers. Soc. Psychol., 100, 426–432.
Writing Group for the Women's Health Initiative Investigators (2002). Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the Women's Health Initiative randomized controlled trial, JAMA, 288 (3), 321–333.
Yong, E. (2018). A popular algorithm is no better at predicting crimes than random people. https://www.theatlantic. com/technology/archive/2018/01/equivant-compas-algorithm/550646/, published January 17, 2018; accessed January 10, 2021.

HUMAN RIGHTS DATA ANALYSIS GROUP STATISTICIANS FOR HUMAN RIGHTS

https://hrdag.org/knowledge-base/

The Human Rights Data Analysis Group is a non-profit, non-partisan organization that applies rigorous science to the analysis of human rights violations around the world. We are non-partisan—we do not take sides in political or military conflicts, nor do we advocate any particular political party or government policy. However, we are not neutral: we are always in favor of human rights. We support the protections established in the Universal Declaration of Human Rights Universal Declaration of Human Rights , the International Covenant on Civil and Political Rights, and other international human rights treaties and instruments. As scientists, we work to support our partners—the advocates and human rights defenders who “speak truth to power”—by producing unbiased, scientific results that bring clarity to human rights violence and by ensuring that the “truth” is the most accurate truth possible. While our partners—international and local human rights groups—advance human rights by listening to and amplifying the voices of victims of human rights violations, by shaping the questions we address and by guiding the data collection, we use technical and scientific expertise to analyze the invaluable data they collect. With this data, we use rigorous quantitative reasoning to understand patterns of violence, and even to make statistical estimates of events that are not in the data. For our projects, data come from many sources. We have used individual testimonies, legal depositions, probability surveys, administrative records from morgues and cemeteries, exhumation reports, operational records from a prison, career information on military and police officers, eyewitness interviews, and official customs and immigration records. We work with partners to help them make decisions about the databases and systems they might use to collect and manage data; our primary focus, however, is on the rigorous scientific analysis of our partners’ data. We believe truth leads to accountability, and at HRDAG, promoting accountability for human rights violations is our highest purpose. In the wake of mass killings and genocide, deportations and ethnic cleansing, and systematic detention and torture, accountability may mean many things. It could mean, simply, learning what really happened. Accountability could also mean a criminal trial for perpetrators. Or it might mean having the worst perpetrators removed from public office. Because accountability hinges on truth, we work toward discovering the most accurate “truth” possible. To this end, we apply statistical and scientific methods in the analysis of human rights data so that our partners—human rights advocates—can build scientifically defensible, evidence-based arguments that will result in outcomes of accountability. We know that our work in data analysis is only one of many human rights approaches to investigating the truth. While our partners engage in an array of human rights approaches ranging from remote sensing by satellites to forensic anthropology to the qualitative interpretation of victims’ narratives, our work—data analysis—is one valuable piece in that puzzle. Through scientific analysis, we provide an accurate knowledge and understanding of the past, and that knowledge can be used by international and local human rights group to effect justice.

STATISTICS´ CENTRAL ROLE

“Statistics is central to the modern perspective on human rights. It allows researchers to measure the effect of health care policies, the penetration of educational opportunity, and progress towards gender equality. The new wave of entrepreneurial charities demands impact assessments and documentation of milestone achievement. Non-governmental organizations need statistics to build cases, conduct surveys, and target their efforts.”  https://www.aaas.org/programs/scientific-responsibility-human-rights-law/statistical-methods-human-rights

STATISTICS´ CRITICAL ROLE

https://www.ohchr.org/sites/default/files/Documents/Issues/HRIndicators/StatisticsAndHumanRights.pdf The realization of human rights,” according to the Office of the High Commissioner for Human Rights of the United Nations, “correlates with the availability of sound official statistics. Statisticians play a critical role in supporting evidence-based policy and measuring civil, economic, political and social rights.”

STATISTICS IMPORTANCE

Statistics is especially important in the human rights field, because reliably produced information based on hard evidence can be a key element in securing a remedy for violations. -- Richard Pierre Claude, Science in the Service of Human Rights, University of Pennsylvania Press, 2002, p. 106.