Japanese technology behemoth Sony described a possible way to measure system bias against some skin tones in a recent paper.
Computer vision systems have historically struggled with accurately detecting and analyzing individuals with yellow undertones in their skin color. The standard Fitzpatrick skin type scale does not adequately account for variation in skin hue, focusing only on tone from light to dark. As a result, standard datasets and algorithms exhibit reduced performance on people with yellow skin colors.
This issue disproportionately impacts certain ethnic groups, like Asians, leading to unfair outcomes. For example, studies have shown facial recognition systems produced in the West have lower accuracy for Asian faces compared to other ethnicities. The lack of diversity in training data is a key factor driving these biases.
In the paper, Sony AI researchers proposed a multidimensional approach to measuring apparent skin color in images to better assess fairness in computer vision systems. The study argues that the common approach of using the Fitzpatrick skin type scale to characterize skin color is limited, as it only focuses on skin tone from light to dark. Instead, the researchers put forward measuring both the perceptual lightness L*, to capture skin tone and the hue angle h*, to capture skin hue ranging from red to yellow. The study’s lead author, William Thong, explained:
“While practical and effective, reducing the skin color to its tone is limiting given the skin constitutive complexity. […] We therefore promote a multidimensional scale to better represent apparent skin color variations among individuals in images.”
The researchers demonstrated the value of this multidimensional approach in several experiments. First, they showed that standard face images datasets like CelebAMask-HQ and FFHQ are skewed toward light-red skin color and under-represent dark-yellow skin colors. Generative models trained on these datasets reproduce a similar bias.
Second, the study revealed skin tone and hue biases in saliency-based image cropping and face verification models. Twitter’s image cropping algorithm showed a preference for light-red skin colors. Popular face verification models also performed better on light and red skin colors.
Finally, manipulating skin tone and hue revealed causal effects in attribute prediction models. People with lighter skin tones were more likely to be classified as feminine, while those with redder skin hues were more frequently predicted as smiling. Thong concluded:
“Our contributions to assessing skin color in a multidimensional manner offer novel insights, previously invisible, to better understand biases in the fairness assessment of both datasets and models.”
The researchers recommend adopting multidimensional skin color scales as a fairness tool when collecting new datasets or evaluating computer vision models. This could help mitigate issues like under-representation and performance differences for specific skin colors.
Featured Image Credit: