Potentially error-prone, unsecured, and hard to maintain, spreadsheets create data silos and discourage collaboration. Credit: Getty Images Spreadsheets are a powerful, multipurpose tool used by businesspeople, analysts, data scientists, and technologists to collect, integrate, cleanse, analyze, and present data. They became popular in the 1980s with Visicalc and Lotus 1-2-3 until Microsoft Excel began its dominating run in the 1990s. Today, Excel remains a leading Microsoft 365 platform but has competitors, including Google Sheets and Zoho Sheet. As technologists, we’re well aware of the saying, “Pick the right tools for the job,” and there are many reasons people will continue to use spreadsheets for at least the next decade. They’re certainly versatile tools for lightweight data analysis, sharing, and presentation, but their usefulness diminishes with growing data sets, increasing collaboration, and more business-critical workflows. In my book, Driving Digital, I share some of the large-scale spreadsheet issues and materialized risks that befell companies such as Enron. The European Spreadsheet Risks Interest Group tracks spreadsheet-related issues. Recent reports include 16,000 lost COVID-19 test results and delays in opening a hospital. Spreadsheets can undermine efforts to become a data-driven organization and, left without guardrails, overuse of spreadsheets can really kill your business. As technologists, we need to understand problems, identify solutions, and align stakeholders on transformation. Here are five reasons why spreadsheets at scale can be problematic. Spreadsheets can create data quality issues Justin Gage, community lead at Retool, recognizes that people and teams use spreadsheets for data entry and elements of their workflows. Although spreadsheets have data validation tools, flagging users around dates, numbers, and other primitive data types is only the basics. Many spreadsheet users don’t know how to use them, and you can end up with names, addresses, phone numbers, and other simple information in multiple formats. Gage says that scaling input validation in spreadsheets is problematic. “Spreadsheets are a great quick way to get started if you need to keep track of data, chart simple trends, or build basic tools. But they don’t scale. Once you need to start thinking about input validation, user access controls, or anything custom, you get stuck.” Complex spreadsheets are often error-prone Spreadsheets are a primary tool used in manual processes, and errors can come from copied and pasted cells, mistakes in formulas, added rows and columns, and a myriad of other defects. It all adds up to erroneous results. Scott Henderson, CTO at Celigo, says, “Manual processes, such as mundane spreadsheets, are to blame for costly errors, disconnected data silos, and decreased productivity in every department. Automating processes across the business saves time and resources and provides a holistic view for better decision-making.” The problem can go from bad to worse because complex formulas, scripting, and other advanced techniques are hard to debug. Doug Fuehne, senior VP at PriceFX, adds, “With manual creation and often impenetrable formulas, errors are common and hard to find and eliminate.” Maintaining spreadsheets is time-consuming and a compliance risk Going beyond the defects and potential problems with spreadsheets, developing them can also be very time-consuming, and they are difficult to maintain. Luke Jacobs, CEO and cofounder of Encamp, says, “Using spreadsheets is a tedious and time-consuming process that leaves plenty of room for mistakes to happen. Not only do spreadsheets increase the risk of human error, but they also require copious amounts of time that could be put toward other crucial tasks.” Ian White, ChartHop founder and CEO, shares an interesting use case that raises concerns about errors, manual work, and potential compliance issues when working with employee data and other sensitive information. White says, “Spreadsheets have long been the only way to compile employee data together from all the systems that HR teams typically use. It leads to hours upon hours wasted on ‘spreadsheet gymnastics’ without producing meaningful answers to the most basic questions that every company should be able to answer about their employee population.” Spreadsheets can impede collaboration and create data silos How many times have you heard someone say, “Just email me the spreadsheet”? Emailing documents still happens today even though employees can share access on OneDrive or Google Drive. Having multiple people collaborate on a spreadsheet can be tricky. Who changed the data? How did these rows disappear? Why was the formula changed? Changes are hard to track when a team uses spreadsheets for collaboration instead of an app or other tool to manage workflow. Fuehne adds, “Spreadsheets do not easily support team environments with multiple parties making updates and changes—and having to understand the logic. The combination of these factors means that the business faces not only revenue risks but also high costs of operation.” Collaborating with spreadsheets also increases the likelihood of creating data silos. When analysts connect to data sources, download data, and create formulas, pivots, and other data operations, they effectively build an isolated, derived data source. Without practices in place to capture data processing steps and centralize the derived data, it’s unlikely that others in the organization will know about this data source. This cycle can perpetuate if additional analysts tap into the same data and create duplicate derived data sources. Of course, problems only worsen when employees share spreadsheets with customers and partners who don’t have access to the corporate network. In many situations, people send these files by email, and it’s hard to protect sensitive data. Large data sets often don’t perform well It wasn’t long ago that Excel only supported 65,000 data rows; even today, Microsoft limits worksheets to a little more than 1 million rows and 16,000 columns, cells with 255-character column widths, and other factors. Many analysts work with larger gigabyte or terabyte data sets, and analyzing them with spreadsheets is probably not feasible. Fuehne says this is a concern for businesses because “spreadsheets do not scale. Complex calculations that rely on data from the market or a significant volume of transaction data are slow to update and often freeze.” Even medium-size data sets pose a more significant challenge beyond performance issues. With all the data visualization tools and machine learning capabilities available, there are more efficient and smarter ways to analyze long and wide data sets. So, although spreadsheets are versatile, better data management, visualization, collaboration, and integration options exist today for when businesses need more robust platforms. In a future post, I’ll share types of tools and platforms that developers and technologists should consider when replacing or upgrading spreadsheets. Related content feature Dataframes explained: The modern in-memory data science format Dataframes are a staple element of data science libraries and frameworks. Here's why many developers prefer them for working with in-memory data. By Serdar Yegulalp Nov 06, 2024 6 mins Data Science Data Management analysis Cloud providers make bank with genAI while projects fail Generative AI is causing excitement but not success for most enterprises. This needs to change quickly, but it will take some work that enterprises may not be willing to do. By David Linthicum Nov 05, 2024 5 mins Generative AI Cloud Computing Data Management feature Overcoming data inconsistency with a universal semantic layer Disparate BI, analytics, and data science tools result in discrepancies in data interpretation, business logic, and definitions among user groups. A universal semantic layer resolves those discrepancies. By Artyom Keydunov Nov 01, 2024 7 mins Business Intelligence Data Management feature Bridging the performance gap in data infrastructure for AI A significant chasm exists between most organizations’ current data infrastructure capabilities and those necessary to effectively support AI workloads. By Colleen Tartow Oct 28, 2024 12 mins Generative AI Data Architecture Artificial Intelligence Resources Videos