Algorithms and other machine-centric systems like artificial intelligence (AI) are the highest-order data producers, taking input data from a large number of sources, and through computation and pattern detection producing useful output data that mimics human decision making. Algorithmic output data is thus often probabilistic rather than absolute. Simple algorithms might combine your smartphone’s location with the location of a bank at the time you make a withdrawal to assess the probability that you’re really standing in front of an ATM in another country. More complex algorithms might use your home address, census data, government records, and credit card history to determine what interest rate you should be charged on a new loan. The most publicly discussed algorithms are those that rank pages in search engine queries and those that attempt to match users to online advertising.
More than most other data sources, algorithms are sensitive to bias and false input, and tend to be backward-looking where they are statistically based. For instance, an employee ranking system that uses race and gender as predictors of executive success might rank women and minorities lower simply because, in the past, they were under-represented in the data and thus the number of successful executives without these characteristics was historically larger. It is important to assess whether an algorithm is a source of any of the datasets being used in subsequent analysis so that bias assessment can be incorporated.
Artificial intelligence and machine learning are special case algorithms that use statistical techniques to identify patterns in input data. A characteristic of these systems is that it is often difficult to determine exactly how the system as a whole arrives at it’s conclusions, making audits for bias difficult.