The Politics of Data Warehousing

Marc Demarest
marc@noumenal.com

June 1997

   
          

ABSTRACT

Data warehousing projects are frequently side-tracked or derailed completely by non-technical factors, in particular the political treaty lines within the firm, and the politicized nature of data itself. Because data warehouses are infrastructure for sociotechnical systems (STSs) within the firm, politics and the exercise of power are inherent in data warehousing projects, and data warehouse designers have to adopt work practices and methods from non-technical disciplines, think of themselves in new ways, and employ some fairly sophisticated qualitatively sociological methods in order to optimize the chances for successful deployment of data warehouses.

          

Where Are The Failed Data Warehousing Projects?

The signal-to-noise ratio in the data warehousing market is the lowest it has ever been. As more and more vendors enter the marketplace, as more and more professional conference-making companies create more and more professional conferences, as the number of articles that cross our desks each week climbs and climbs, I can’t help feeling that we are missing something in the loud and often pointless public discourse on data warehousing: something fairly important. Where are the failed data warehousing projects?

I believe -- although I have only my own experience as a consultant, and quiet talk with some of my peers, to support this belief -- that the data warehousing marketplace woefully underreports both actual cases of failed data warehouse projects, and normative and prescriptive information for warehousing practitioners on techniques for avoiding project failure.

Why actual cases of failure are underreported will be obvious to anyone who’s ever written or approved a capital appropriation request for a million dollars: misspending that much money is not something you talk about with PC Week if you intend to remain employed or get another job in the IT business.

What’s not so obvious to me is why the normative or prescriptive information about failure-avoidance in data warehousing projects is so scanty. If we mark the founding of the data warehousing discipline from Inmon’s seminal definition in 1990 (and surely the discipline is older than that by any practical standard, particularly if we give credit to Commander and CommandCenter as the progenitors of OLAP), we have at least a decade of collective experience in the discipline, multiplied by, say, 10,000 practitioners worldwide, each working a nominal 8 hour day for 220 days: 176,000,000 person-hours of experience. That ought to be more that enough collective guild practice for some clear prescriptive models for failure avoidance to have emerged by now.

Yet the information in the IT trade press -- our disciplinary best-practices exchange mechanism -- is almost nonexistent. For example, using a common CD-ROM based data base of about 70,000 IT journal articles from 110 periodicals covering the period July 1995 to July 1996, the query:

    ("data warehouse" OR "data warehousing") AND ("fail" or "failure")

returned less than 100 articles, while the query

    ("data warehouse" OR "data warehousing") AND ("success" or "successful")

returned 279 articles, and the query

    ("data warehouse" OR "data warehousing")

returned 1283 articles.

Assuming I wasn’t being tricked by the disclosure of failure avoidance tips under the headings of "how to succeed" articles, I looked into the 100 articles from the first query, and found some of what I was looking for. Most articles contained throw-away, unimplementable advice:

    But organizations considering this project must also consider the deployment or data administration issues. Without a well-thought-out strategy, the data warehouse will fail. At the very least, a warehouse deployment strategy must have three key goals: consistent and synchronized data, data quality, and adequate end-user query and access tools.[1]

Such incisive analysis...

Some of the articles were more substantive, however. Larry Greenfield, one of the more astute observers on the data warehousing scene, lists these ten warnings:

  1. You may need to spend 80% of your time extracting, cleaning, and loading data. Consequently, you may not have enough time to build applications against the data.

  2. You are going to find hidden problems in the systems feeding the data warehouse.

  3. Warehouse projects often turn up the need for data not being captured by existing systems.

  4. After end users receive query and reporting tools, requests for IS written reports may increase rather than decrease, contrary to your ROI forecast.

  5. Your warehouse users will develop conflicting business rules. Many query, reporting, and OLAP tools allow users to perform calculations, and there is near certainty that users will perform the same calculation differently.

  6. Large-scale data warehousing can become an exercise in data homogenizing that lessens the value of data.

  7. Overhead can eat up great amounts of disk space.

  8. You can't assign security when using a transaction-processing system mindset.

  9. Data warehouses are high-maintenance systems.

  10. You will fail if you concentrate on resource optimization to the neglect of project, data, and customer management issues and an understanding of what adds value for the customer.[2]

All true, I think, and all important considerations. But articles of this type were the exception, not the norm, which looked more like this:

  1. Pick a careful target. A data warehouse is a complex undertaking. It can also have a dramatic impact on the bottom line. Thus, you should pick a relevant pilot project that can demonstrate the technical considerations and prove that such a warehouse can be productive for the organization.

  2. Define consistent business rules. The most important aspect of setting up a data warehouse is to define business rules consistent with the data that will be going into the warehouse.

  3. Create a metadata model. Politics often impact IT projects. The worst way to start a warehousing project is to develop a corporate-wide data warehouse and define all the business rules across the company.

  4. Demonstrate bottom-line results. [3]

"A data warehouse is a complex undertaking." There’s rocket science-class analysis for you. "Create a metadata model" -- you can guess what this consultant is selling as the cure-all for data warehousing. All in all, we don’t seem to be telling one another very much about what kinds of complexity we’re dealing with, why data warehousing projects fail, and what we can do, as practitioners, to prevent project failures.

Modeling Data Warehousing Project Failure

It seems to me that all the advice in the trade press, all my personal experience, and what I know of other practitioners’ experiences, leads to a pretty simple four-category model of warehousing project failures. Data warehousing, data marting and other kinds of decision support systems (DSS) projects fail because of:

  1. Design Factors: the architecture of the system is in some way deficient. Metadata is ignored as a structural concern, data engineering complexity and time factors are minimized, schema are designed inappropriately or in a vaccum, client-side tools are either neglected as components or allowed to dominate the design. No consistent business-oriented design method is used, or no method at all is used.

  2. Technical Factors: the technology components selected for integration into a warehouse ensemble are in some way deficient. The wrong components are chosen, the wrong components are allowed to drive design or implementation, vendor claims are not tested, scalability with respect to data set sizes, query volume or network traffic goes unexamined.

  3. Procedural Factors: the way in which the warehouse is built and deployed is in some way deficient. The project is scoped improperly; scope creep isn’t countered effectively; proof-of-concept implementations are not used or not used appropriately; user communities are not involved in the design phase; operations and management procedures in the data center are not tested against the new and different requirements of warehouse environments; pilots do not precede full-scale implementation; designers, implementors, DBAs, operators and end-users aren’t trained.

  4. Sociotechnical Factors: people and politics aren’t considered explicitly within the project scope.

Of these four categories, only the last -- the sociotechnical factors -- are significantly new for most IT organizations. Design factors have always been in play in complex IT projects; data warehousing introduces new design models and disciplines, but doesn’t change the fundamental playing field. Similarly, technology factors have always played a role in IT project success and failure; all data warehousing projects do is aggravate this historical problem area by (a) increasing the number of separately-purchased components involved in the project and (b) raise significantly the integration burden, in terms of development and testing activities, imposed on the IS organization. Procedural factors are also familiar territory, though the presence of a large data warehouse platform on the data center floor frequently means new database technologies, different backup and recovery processes, and sometimes foreign systems management and tuning tools and processes.

But the sociotechnical factors are largely new; this area was pretty much submerged in classical OLTP design. People and politics were less of an issue, or no issue at all. The target user community held little organizational power, was rarely seriously consulted during design, and could in most cases be compelled to use the system once it was deployed (though there are those interestingly complex unionized situations). And the project’s sponsor was almost invariably after a rock-solid, measurable business objective: process or task routinization, cost containment, workforce reduction.

It’s this last area -- people and politics -- that I’m really interested in, because the most painful, indirect and ultimately revelatory discussions I have with my clients are centered around the politics of data warehousing. Sometimes these conversations begin under the heading of "engaging with the business unit"; other times they are framed by the question "How do we cost-justify a data warehouse?" or "How do we get management buy-in on a data warehouse?" Infrequently, they are frank enough that other, more central and more overtly political topics surface: how do we get the business units’ IT organizations (or end-user communities) to trust us?

I have had these conversations quite often in the last couple of years, particularly when my firm is rebuilding one of our customers’ data warehousing projects after another firm has cratered the customer’s project and beat a hasty retreat. Although I can’t prove it, I am convinced that the most common set of factors contributing to data warehousing project failure are not design factors or technology factors or procedural factors, but sociotechnical factors: people and politics.

Talking About Politics

If we were honest with ourselves, as professionals, we would admit what Rosabeth Moss Kanter suggested in 1979 in a famous Harvard Business Review article: that

    Power is America’s last dirty word. It is easier to talk about money -- and much easier to talk about sex -- than it is to talk about power. People who have deny it; people who want to it do not want to appear to hunger for it; and people who engage in its machinations do so secretly.

Power is also the last dirty word in data warehousing. And it’s crippling the discipline, in my view. Anyone who’s actually done a warehousing project knows very well that most of their organizational energy was spent dealing with political issues of a few particular sorts, and yet the query

    ("data warehouse" OR "data warehousing") AND ("political" or "politics")

against the article database I mentioned earlier returned only 29 articles from the database described above. The vast majority of these articles are notable for the vacuity of their comments on the political factors in data warehousing, resorting to pat formulae like:

    IS managers are realizing that the road to data warehousing is littered with as many people and political land mines as it is with technology obstacles. Turf battles, user resistance, and power struggles can be as critical to the success of a data warehouse as choosing the right back-end database or design schema. IS managers need to be properly prepared to effectively manage people and finesse organizational issues.[4]

and this gem, found in a value-added reseller trade publication in an article on how "hot" the data warehousing market is for VARs:

    Another factor you need to consider before rushing headlong into data warehousing can be summed up in one word: Politics. Since the projects are multiyear deals with costs frequently running into seven figures, organizational politics become a key factor in the success of these endeavors.[5]

Apparently at least one vendor community recognizes the sociotechnical factors in the market, if only as an obstacle to quick and easy revenue.

A couple of well-put, pointed articles did emerge, particularly two by Julia Vowler[6]. But there was no systematic discussion of the politics of data warehousing to be found anywhere in the almost 1300 articles in the database.

This fact, and a couple of recent conversations with clients, led me to ask myself a few questions:

  • what does "politics" mean in the context of data warehousing?

  • how would a project team know, before it even started analysis and design work on a warehouse, that they were embarking on a political odyssey? what are the top 10 signs of a politicized data warehousing project?

  • what are the 10 practical things a project team can do to minimize the impact of the sociotechnical factors -- people and politics -- on a data warehousing project?

Politics In Data Warehousing

Data warehousing projects are always potentially political because:

  • they cross organizational treaty lines

  • they change both the terms of data ownership and data access, and expose the often-checkered history of data management in the IT organization

  • they affect the work practices of highly autonomous and powerful user communities in the firm.

Politics, Part One: Treaty Line Violations

Any sensible data warehouse design is a part of a larger architectural model designed to deliver data from the points of capture (inside or outside the firm) to the points of use, probably transforming the data elements in the process. A warehouse, in other words, is the key data consolidation and pumping station in a complex data distribution system that begins with the firm’s production applications and external data syndicates[7], wholesalers[8] and enrichers[9] of various sorts, and ends on the intelligent desktops of managers, analysts, customer care personnel and the like. That network always crosses treaty lines: invisible boundaries within the firm that mark both "turf" and "domains of control".

Some of these treaty lines are functional (with the inevitable tensions that are an in-built feature of the functional organization as an organizational form), but the most insidious treaty lines are almost never functional. Consider, for example, the near-invisible treaty line that is drawn across the backplane of every PC in the modern corporation. Much like the residence jack in the American telephone system, the desktop treaty line marks a boundary between the "provider" and the "consumer" and, just as we are free to choose our telephones, PC users feel themselves imperatively to be in rightful possession of personal computers[10], the tool configuration of which is a private affair. When a warehousing project mandates toolsets or impinges on the desktop in other ways, the treaty line is crossed, and warning bells often ring out.

Politics, Part Two: Data Ownership and Data Access

If there is any rule that does apply across organizations regardless of their market focus or structure, it is this: power accrues to those who:

  • gather data

  • control access to that data.

The irony in data warehouse projects is that, all too often, these laws are working against the very organization they used to work for: the corporate IS organization. As often as not, the data we can’t get for a data warehouse is the data controlled by semi-autonomous divisional or departmental IS functions that came into being because the historically stingy policies of the c orporate IS organization with respect to data access hamstrung a division, department or business unit so badly that they built their own IS function to regain the autonomy they needed for marketplace effectiveness. The dangers of centralized data control constitute, at some fundamental level, the raison d’etre of too many distributed IS organizations, and as a result their willingness to collaborate with the adversary is minimal at best[11].

But the real political problem with data warehousing is not the loss of data ownership that such projects imply, for every organization asked to contribute to the warehouse, a loss of control over access to the raw data itself, something frightening for any group with:

  • dirty data

  • ambiguous data

  • unflattering data.

which, in my experience, is pretty much every commercial organization operating today. Inasmuch as divisions, departments and business units have gently cooked their dirty, ambiguous and unflattering data for years in the interest of keeping things clear and simple (and in the legitimate interest of not allowing bad or ambivalent data to get in the way of clarity about the real state of the business at whatever level), these organizations are understandably leery about exposing the uncooked data to inspection by other groups that may have a vested interest or historical reason to point out the data quality issues.

Politics, Part Three: Work Practice Integration

We are comfortable with the notion that production and service workers are obliged, by the terms and conditions of their employment, to submit to the discipline of information technology. When I was an accounts receivable data entry clerk for IBM in the early 1980s, I did my work as the system told me to do it: I stepped through tasks the IT presented to me, in the order it presented them to me, on a 327x terminal screen, for six hours a day, every day. That was my job: to be disciplined by the machine.

As I’ve walked up the hierarchy over the last fifteen years, from production worker to service worker to knowledge worker, my IT has backed off, and then come back not as a disciplining force, but a facilitating force. That is so in part because the firm is not permitted, by the terms and conditions of employment, to inspect or discipline the work practices of knowledge workers. Attempts to do this – even relatively benign ones such as TQM process definition initiatives or restatements of the obvious fact that electronic mail is company property – are met with resistance on a grand scale, muttering about invasion of privacy, and the magic word: "professionalism." Professionals, it seems, are above needing external discipline: because they practice a profession, whatever discipline they need with respect to what they do and how they do it is supplied by the profession, not by IT.

One of the things knowledge workers do, early and often, in a mostly unscrutinized way, is make decisions. Small and large, important and inconsequential, decisions get made every day about all kinds of things. And the plain fact seems to be that most of these decisions are data-free or data-poor, made based on "experience" or "gut feel" or some other intangible, unmeasurable quality that is decidely human, decidely part of these magic professional disciplines and decidely not something a computer can frame, direct or do (see Treaty Lines above).

The work practice of decision-making has been done historically outside the IT infrastructure of the firm. Data warehousing projects threaten this long-standing practice. And they create, in knowledge workers, what Thorsten Veblen called, in another context,"the conscious withdrawal of efficiency": passive-aggressive behavior on the part of knowledge worker communities that includes

  • an unwillingness to participate in requirements gathering, schema design activities, and pilots

  • failure to use deployed warehouses and marts

  • endless, and pointless, micro-analysis of the "quality" or "real meaning" of the data provided by warehouses and marts.

Sensing The Political: 10 Warning Signs

How can a project team, comprised mostly of IS personnel, know before they start that their data warehousing project is, or is likely to become, politicized? While your mileage may vary, there are common, clear signs that your project is or will become politicized:

  1. a strong but poorly-defined sense of urgency, often bordering on the frenetic, that cannot be clearly linked to changes in the firm’s market position or financial health is driving the project.

  2. demands made by IT management or senior business management for a "cost justification" or "business case" without clear indications of the targets for that justification or case

  3. lack of a strong, vocal business constituency advocating for the project

  4. project sponsors that do not include both IT and business representation

  5. a stated project objective focused on "data" rather than on "revenue," "sales," "customer satisfaction," "marketing," or some other business objective

  6. people inside and outside the project team talking about the data warehouse primarily or exclusively in terms of its technology components.

  7. warehouse project team members who cannot explain to themselves how the deployment and subsequent use of the technology ensemble will materially (a) contain or reduce costs, (b) enhance revenue or (c) manage risk.

  8. end-user organizations who do not consistently devote time, at the individual level, to warehouse or mart analysis, design and pilot activities

  9. senior business management that does not take a proactive interest in the status of the project, but instead has to be compelled to take an interest

  10. nay-sayers and doubters in end-user constituencies who become increasingly vocal as the project progresses.

Mastering The Political: 10 Countermeasures

  1. Change your mindset. As a warehouse designer, you are a sociologist, marketeer, diplomat and technologist, probably in that order. Spend more time thinking about your constituencies and their needs, how you’ll market to those constituencies, and how you’ll establish treaties and working relationships with those constituencies than you spend drawing data models and technology architecture diagrams.

  2. Get comfortable with being frank, particularly with data set owners. Expect, and say that you expect, to find dirty, inconsistent, incomplete data. Help the data owners get comfortable with this as state-of-nature, rather than some failing on their part. Help them clean it up; in fact, point out to them that the warehousing or marting project is an ideal opportunity for them to get out from under their data burden, to get clean.

  3. Do the sociological analysis first. Know all your constituencies, data and access, technical and business, what their risk/reward profiles are, what they have to give up for the project to succeed, and what they gain if the project is successful.

  4. Develop the internal marketing plan before you do the first design. Know who you have to sell the project to, what the value proposition for each target audience is, and how you’ll know when they’re on board.

  5. Establish clear bilateral treaties or contracts with all the data-owner constituencies involved in the project before the design phase of the project is complete. Each treaty should specify who gets what from whom when and under what circumstances.

  6. Regularly repeat and reset expectations with each constituency, face to face. You’ll know you’re doing your job when the constituencies begin to notice that you’re repeating yourselves.

  7. Spend at least 10 minutes of every project meeting discussing the political climate surrounding the project, changes in the constituencies, status of internal marketing efforts, and late-breaking gossip and hearsay. When team members are uncomfortable discussing the politics of the project in company, help them to get over that discomfort.

  8. Spend at least 15 minutes of every meeting reviewing the work in progress in terms of (a) the business objectives for the project (which must be measurable in terms of the firm’s income statement) and (b) the user constituencies’ needs and their work environment.

  9. Have a formal process for logging, reviewing and either rejecting or accepting (with cost and time changes formally noted) all changes to project scope.

  10. Spend a lot of time in the face of the senior technical and business sponsors for the project. Make sure they know, every week, what the status of the project is, what the organizational roadblocks are, and what the project team expects the sponsors to do about those roadblocks. Use your sponsors like air cover to break down, via organizational fiat, resistances in the organization that the project team cannot remove through diplomacy.

Getting Into Politics

Information technology is, for better or for worse, social these days. The good old days of batch and online transaction processing systems design and deployment are gone; we buy those things now, from independent software vendors. The systems we have to build – decision support systems, computer-supported collaborative work environments, workflow systems, intranets, extranets, whathaveyou-nets – are all deeply and inextricably social applications of IT: computing applied to groups of people with power, status and a network of relationships.

That means, for better or for worse, that politics is an integral part of IT projects from here on in. Or out, depending on your perspective. And that, in turn, means IT professionals have no choice but to get into organizational politics, understand the forms, shapes and paths organizational politics takes, and become astute at navigating in a political environment. Not because politics is cool, or fun, but because politics is a feature of the landscape: the beast standing between us and the gate marked "successful project conclusion."


Footnotes

  1. Judith Hurwitz, "Preparing for the warehouse." DBMS April 1996.

  2. Larry Greenfield, "Don't let data warehousing gotchas getcha." Datamation, March 1, 1996.

  3. Judith Hurwitz, "A pragmatic approach to data warehousing", DBMS October 1995.

  4. Rose Cafasso. "Data minefields." PC Week March 18, 1996.

  5. "Building a warehouse: more data and more disparate systems make it a great VAR market." VARBusiness, January 1, 1996.

  6. Julia Vowler. "When people come first" in Computer Weekly December 14, 1995, and "Problems in Store", in Computer Weekly, May 30, 1996.

  7. Data syndicates are member organizations in which members contribute their data to a syndicator and get back their data and those of other members of the group as well, in some unified form. TRW, and A.C. Nielsen are syndicators. Data wholesalers purchase data from a suppliuer and resell it down value chains at a mark up, with or without enriching it first. Dun and Bradstreet is a data wholesaler. Data enrichers typically take a firm's internal data, merge it with data from other sources and return it to the firm in question. Harte-Hanks is a data enricher.

  8. This phenomenon - "this is my computer" - has little to do with the firm's explicit policies about control over the desktop. The firm may very well tell its employees that the firm owns the desktop, but Microsoft and Novell and other desktop vendors repeat incessantly, in every advertisement and on every web page, the same basic statement: "this is your tool. Where do you want to go today?" That "you" is not the firm or the IS organization - it is the individual.

  9. Although I didn't have the guts to say it in public at the time, one of the fundamental reasons I proposed a two-tiered warehouse/mart enterprise DSS architectural model in 1994 was to partition the corporate IS organization's legitimate need for a warehouse infrastructure from the divisional or business unit IS organization's equally legitimate need for control over their own components of the DSS infrastructure: the marts. The treaty line, in the real world, between CPG wholesalers and warehousers, and CPG retail outlets, seemed to me a pattern we needed to adopt in large-scale DSS environments, in the name of getting on with the business of informating the business.
          

Last updated on 06-26-97 by Marc Demarest (marc@noumenal.com)

The authoritative source of this document is http://www.noumenal.com/marc/dwpol.html