‘Hurricane models: garbage in, gospel out’ (November 15)

The following is an analysis that is part of a graduate thesis. Learn more.

This story was one of the most ambitious in the series. It contained several conclusions, all delivered in the introduction to the article, and then helpfully addressed in separate sections.

The focus of the story was the models that insurers and reinsurers use to estimate the likelihood of hurricanes striking a particular area and the magnitude of damage the storms are likely to cause. The conclusions St. John offered in this story were:

  1. “The catastrophe models at the core of just about every aspect of hurricane insurance, from rates to regulation, are flawed.” (3)

  2. Models examined after major storms have been found to be “stuffed with bad data” (5). A “‘garbage-in, gospel-out’ mentality has taken hold: Insurers plug in bad information about the property they insure yet accept the risk calculations spit out of the model as fact.” (5)

  3. “Models are being used not to seek the most accurate picture of hurricane risk but to chase the highest profits.” (6)

  4. “The realm entrusted to [models] is growing. Since Katrina, catastrophe models have been expanded to include costs for political meddling, government ineptness and even human greed.” (13)

The quality of justification given to each of these points varied widely.

’Flawed from the start’

To her first conclusion, that “the catastrophe models … are flawed,” St. John offered three reasons. First, she wrote, “modelers have less than 50 years of reliable hurricane experience” on which to base their work (17). Second, “many assumptions must be made, producing results that span wide ranges” (17). Third, St. John provided anecdotes from insurers whose models either underestimated or overestimated what specific hurricanes would cost (21–24).

Was this support persuasive? No. For one, the reasoning contained an unstated, and unjustified assumption: that assumptions themselves are not useful, or “flawed.” Holding this assumption was the only way to link St. John’s comments about the existence of assumptions in models to her conclusion that “catastrophe models … are flawed.” Otherwise, to say that the models required assumptions didn’t do much more than state the obvious. Models predict; by nature they are tools, not truth-dispensers, a point that St. John herself made in the next section (29).

If St. John meant to say that the models make too many assumptions, or that the assumptions are fatally flawed in some way (as she might have meant, given her previous story about the four scientists gathered to finish RMS’s modeling software), then her argument would change. The argument would instead acknowledge the need for assumptions but disagree about which ones should be in play. Unfortunately, St. John provided no reason to question anything about the assumptions in the models.

Similarly, the statement that fewer than 50 years of “reliable hurricane experience” was presented without any context that would have given the statistic meaning. Context that would have helped readers understand the statistic might have included the number of years of history that are appropriate for creating useful hurricane models (fifty-five years? five hundred?) and why.

Finally, the evidence presented in the form of failed predictions of models for some insurers would have been persuasive only if readers assumed that the insurers had no role to play in how the models computed their estimates. Readers would have had to assume that the models worked the same for everybody. But the next section demonstrated that this assumption would be false: Insurers are responsible for much of the data that the models rely on, so to show that some models got it wrong did not in itself demonstrate that the models were flawed. To conclude that, readers would have needed to also know that the data fed to the model were accurate. St. John’s next section was dedicated to showing that often those data were suspect, which put the conclusion currently at issue in question.

‘Garbage in’

St. John’s second conclusion, then, concerned the quality of data that insurers put into their models. She concluded that “models were stuffed with bad data,” and that a “garbage-in, garbage-out” mentality had taken hold in the insurance industry.

There perhaps wasn’t home-run evidence to support these claims, but there was pretty good proof. It came from the reporting of two internal studies by the modelers AIR Worldwide, which “discovered property values for commercial buildings off by as much as 90 percent” (31), and RMS, which, according to a report in another newspaper, “found an 80 percent error rate” (32). It was admittedly unclear how St. John knew of the AIR study. It was also difficult to know whether a finding that “commercial buildings” were misvalued “by as much as 90 percent” indicated that models were “stuffed” with errors. That said, these data provided support for at least a claim that bad data caused problems for how firms prepared for storms.

The quoted criticism of Karen Clark, the AIR founder, of “the minimal amount of data put into models” as “lacking” (35) by companies was, again, not quite fully dispositive of a claim that models contained bad data (it might suggest instead that the models’ problem was that there weren’t enough data in them, not that the data were bad), but it went further the more general claim by St. John that insurers were inputting “bad data” but relying on the results anyway.

However, there was not much to be said in support of the conclusion that the “garbage” mentality had “taken hold.” The closest she had for evidence was a study by Ernst & Young claiming that reinsurers “commonly tack on surcharges as high as 25 percent to cover potentially missed risk” (37). But by itself this wasn’t indicative of anything, let alone that some mentality has “taken hold.” Readers might reasonably have questioned the assumption that would link the evidence to the conclusion, namely that a surcharge for “missed risk” was unusual. Perhaps it wasn’t. But the reader was not given a reason to think one way or the other.

‘Skewing results’

St. John followed her discussion of bad data with perhaps her boldest, most complex claim of the story: That at least some insurers chose “which models to user, or how to use [them]” based not on accurately predicting hurricane risk but on which model would allow them to seek higher rates, or to “chase the highest profits” (6).

This conclusion was potentially harmed by an ambiguity, albeit one not necessarily noticeable unless readers remembered St. John’s story from the previous day. In that story, the computer models discussed were of interest for their estimations of the path and frequency of future hurricanes. In the present story, however, those same models were apparently of interest for the way they were used to estimate the financial liability insurance firms faced in the event of a hurricane.[1] It was potentially ambiguous, then, whether St. John meant “models are being used not to seek the most accurate picture of hurricane risk” (6) from the point of view of the models (where will the hurricanes go and how often?) or from the point of view of the bookkeepers (how much will it cost according to this model?).

The distinction was not clarified in the story, but it gradually seemed clear that, at least in this conclusion, St. John was interested in how insurers used models to make the most accurate prediction of their financial risk. Her evidence in support of her conclusion was fairly strong.

St. John wrote that “filings with Florida regulators show several insurers sought rate increases this year after using catastrophe models that left out loss-reducing details such as roof shape or storm shutters” (41). Although, if true, this would demonstrate a manipulation of models in pursuit of higher rates, readers must still accept the argument, to some degree, on authority. Readers were not told exactly which documents were used draw this conclusion or how they proved it.

Next, St. John wrote that “other insurers … modeled their policies at the ZIP code level rather than street address.” The reason to consider this practice a profit-producing one was that, according to “a former Lloyd’s of London executive,” the practice “generally increases the estimated loss” (42). St. John’s reliance on expert testimony as evidence here was slightly suspect. This “former executive” had not previously appeared in her stories, nor had Lloyd’s of London itself; it was unclear what the person was an executive of (and so whether his or her time at Lloyd’s granted them expert status) and why he or she would require anonymity.

Finally, St. John discussed two instances in which insurers switched to models that “support[ed] a rate hike” (44). According to “confidential documents,” Allstate concocted a “Plan B” in which it would “switch to a later model version known to produce higher losses” if it didn’t think its rate increase request would be approved. St. John reported that Allstate told regulators that it “planned to eventually switch to the higher model anyway and it was just a matter of timing” (46). If this was the reason Allstate gave to justify its plan, then St. John definitely scored on the count of accusing insurers of choosing models based on profits not accuracy. Allstate had apparently not tried to justify its Plan B with science, something readers hadn’t been able to say with confidence regarding the other evidence St. John had presented for this conclusion so far.

So St. John came close to justifying her initial claim, which accused more than one insurer of such model-shopping. Another strong demonstration of the point using her next target, State Farm, would have put her over the top.

Unfortunately, she didn’t quite come through. She reported that State Farm switched to a model that “generates statewide loss estimates 18 percent higher than its previous model” (50). But she also reported that State Farm did offer an accuracy-based reason for the switch: that the model best supported State Farm’s “evidence that home mitigation, such as storm shutters and modern roof design, is not as effective as regulators contend” (50), a contention that St. John did not counter. But leaving that statement unchallenged had the effect of leaving readers unable to determine whether it had merit, and hence whether State Farm chose its models at the expense of accuracy. The only comment that she made about the new State Farm model was that it was “seldom-used” (49). But on its own that comment was not sufficient to disqualify the model on the merits. By itself, in fact, it would have represented an Ad Populum fallacy — that is, that the model was bad only because other models were more popular.

‘Model creep’

Lastly, St. John offered a descriptive conclusion about the growth of modeling post-Katrina, saying that “the realm entrusted to the model is growing.” In the end, she referenced only one model, that of RMS, but everything she reported about the model supported her conclusion, although it was not clear how she obtained the details:

The model RMS created in 2006 calculated not only for a bundle of shingles, but for price-gouging by contractors, claims fraud by policyholders and sloppy work by harried adjusters.

It also created what it called “Super Cat” charges for major storms RMS believed would trigger a series of follow-on disasters, as did Katrina in New Orleans.

They include the economic meltdown of a community, botched disaster response, political interference with insurers, and unforeseen events, such as the collapse of the levees. The Super Cat category drove up insurance costs primarily for commercial policyholders. (53–55)

Any discomfort caused by the opacity of sourcing was somewhat relieved, however, by the inclusion of quotes and paraphrases about the model from RMS executives (67), insurance brokers (59), and regulators who had seen it (62). These attempts to predict and quantify human behavior were perhaps the most interesting part of the article, and fortunately for readers what St. John said about them were well-documented.

  1. I owe Professor Craft for pointing out this distinction to me.  ↩

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s