Humans must be the custodians for preserving high-quality data as AI use continues to advance. Credit: Thinkstock Artificial intelligence has advanced significantly since its inception in the 1950s. Today, we are seeing the emergence of a new era of AI, generative AI. Businesses are discovering a broad range of capabilities with tools such as OpenAI’s DALL-E 2 and ChatGPT, and AI adoption is accelerating among businesses of all sizes. In fact, Forrester predicts that AI software spend will reach $64 billion in 2025, nearly double the $33 billion in 2021. Though generative AI tools are contributing to AI market growth, they exacerbate a problem that businesses embracing AI should address immediately: AI bias. AI bias occurs when an AI model produces predictions, classificatios, or (in the case of generative AI) content based on data sets that contain human biases. Although AI bias is not new, it’s becoming increasingly prominent with the rise of generative AI tools. In this article, I will discuss some limitations and risks of AI and how businesses can get ahead of AI bias by ensuring that data scientists act as “custodians” to preserve high quality data. AI bias puts business reputations at risk If AI bias is not properly addressed, the reputation of enterprises can be severely affected. AI can generate skewed predictions, leading to poor decision making. It also introduces the risk of copyright issues and plagiarism as a result of the AI being trained on data or content available in the public domain. Generative AI models also can produce erroneous results if they are trained on data sets containing examples of inaccurate or false content found across the internet. For example, a study from NIST (National Institute of Standards and Technology) concluded that facial recognition AI often misidentifies people of color. A 2021 study on mortgage loans found that predictive AI models used to accept or reject loans did not provide accurate recommendations for loans to minorities. Other examples of AI bias and discrimination abound. Many companies are stuck wondering how to gain proper control over AI and what best practices they can establish to do so. They need to take a proactive approach to manage the quality of the training data and that is totally in the hands of the humans. High-quality data requires human involvement More than half of organizations are concerned by the potential of AI bias to hurt their business, according to a DataRobot report. However, nearly three fourths of businesses have yet to take steps to reduce bias in data sets. Given the growing popularity of ChatGPT and generative AI, and the emergence of synthetic data (or artificially manufactured information), data scientists must be the custodians of data. Training data scientists to better curate data and implement ethical practices for collecting and cleaning data will be a necessary step. Testing for AI bias is not as straightforward as other types of testing, where it’s obvious what to test for and the outcome is well-defined. There are three general areas to be watchful for to limit AI bias — data bias (or sample set bias), algorithm bias and human bias. The process to test each individual area requires different tools, skill sets and processes. Tools like LIME (Local Interpretable Model-Agnostic Explanations) and T2IAT (Text-to-Image Association Test) can help in discovering bias. Humans can still inadvertently introduce bias. Data science teams must remain vigilant in the process and continuously check for bias. It’s also paramount to keep data “open” to a diverse population of data scientists so there is a broader representation of people who are sampling the data and identifying biases others may have missed. Inclusiveness and human experience will eventually give way to AI models that automate data inspections and learn to recognize bias on their own, as humans simply cannot keep up with the high volume of data without the help of machines. In the meantime, data scientists must take the lead. Erecting guardrails against AI bias With AI adoption increasing rapidly, it’s critical that guardrails and new processes be put in place. Such guidelines establish a process for developers, data scientists, and anyone else involved in the AI production process to avoid potential harm to businesses and their customers. One practice enterprises can introduce before releasing any AI-enabled service is the red team versus blue team exercise used in the security field. For AI, enterprises can pair a red team and a blue team to expose bias and correct it before bringing a product to market. It’s important to then make this process an ongoing effort to continue to work against the inclusion of bias in data and algorithms. Organizations should be committed to testing the data before deploying any model, and to testing the model after it is deployed. Data scientists must acknowledge that the scope of AI biases is vast and there can be unintended consequences, despite their best intentions. Therefore, they must become greater experts in their domain and understand their own limitations to help them become more responsible in their data and algorithm curation. NIST encourages data scientists to work with social scientists (who have been studying ethical AI for ages) and tap into their learnings—such as how to curate data—to better engineer models and algorithms. When an entire team is vigilant in paying detailed attention to the quality of data, there is less room for bias to creep in and tarnish a brand’s reputation. The pace of change and advances in AI is blistering, and companies are struggling to keep up. Nevertheless, the time to address AI bias and its potential negative impacts is now, before machine learning and AI processes are in place and sources of bias become baked in. Today, every business leveraging AI can make a change for the better by being committed to and focused on the quality of data in order to reduce risks of AI bias. Ravi Mayuram is CTO of Couchbase, provider of a leading cloud database platform for enterprise applications that 30% of the Fortune 100 depend on. He is an accomplished engineering executive with a passion for creating and delivering game-changing products for industry-leading companies from startups to Fortune 500s. — Generative AI Insights provides a venue for technology leaders—including vendors and other third parties—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com. Related content analysis How to support accurate revenue forecasting with data science and dataops Data science and dataops have a critical role to play in developing revenue forecasts business leaders can count on. By Isaac Sacolick Nov 05, 2024 8 mins Data Science Machine Learning Artificial Intelligence feature The machine learning certifications tech companies want Not all machine learning courses and certifications are equal. Here are five certifications that will help you get your foot in the door. By Bob Violino Nov 04, 2024 9 mins Certifications Machine Learning Software Development how-to Download the AI in the Enterprise (for Real) Spotlight This issue showcases practical AI deployments, implementation strategies, and real-world considerations such as for data management and AI governance that IT and business leaders alike should know before plunging into AI. By InfoWorld and CIO.com contributors Nov 01, 2024 1 min Machine Learning Data Governance Artificial Intelligence feature The best Python libraries for parallel processing Do you need to distribute a heavy Python workload across multiple CPUs or a compute cluster? These seven frameworks are up to the task. By Serdar Yegulalp Oct 23, 2024 11 mins Python Data Science Machine Learning Resources Videos