National Weather Service United States Department of Commerce
Principal Investigator: Caitlin Bristol (NWS Pathways student North Central River Forecast Center)
Mentor: John Goff (Lead Meteorologist, NWS Burlington, VT)
July 20, 2020 at 2:30 PM    Email the authors


    1. Importance of Flash Flood Warnings

      Flash floods (FF) are one of the top weather-related hazards in the United States [Gourley et al. (2017)]. According to the National Weather Service (NWS) Glossary [NOAA (2012)], flash floods are described as rapid rises of water (above a predetermined flood level) within minutes to hours of the causative event. Flash flooding produces millions of dollars in damages per year alone. In general, FF remain one of the most challenging tasks of an operational forecaster in the NWS because they result from a complex interaction among hydro-meteorological, hydrological, and hydraulic processes across various spatial and temporal scales [Mohammed et al. (2017)]. Forecasting flash floods is a challenging problem that requires taking into account specific, coupled meteorological and hydrological phenomena [Mohammed et al. (2017)]. Specifically, in the northeastern United States, there is varying uphill topography with shallow soils and subsequent runoff which often leads to flash flooding. In NWS Burlington’s (BTV) County Warning Area, flash flooding is a prevalent issue and prominent hazard.

      Fortunately, in recent years there have been significant improvements in numerical model resolutions and computational capabilities that have increased the accuracy and lead times of flash flood warnings for NWS weather forecast offices (WFO). Hydrologists and hydrometeorologists are researching better ways to communicate the complex issues with forecasting and warning for these events. One way to assess the effectiveness of forecasts is to set obtainable performance standards.

      The federal government developed the Government Performance Requirement Act (GPRA) to evaluate the NWS on their performance goals and to their actual scores. The goals and scores for the entire NWS (lead time and accuracy for FF) were compared to actual national standards, NWS eastern region (ER) and BTV to determine BTV’s performance. Since 2015 Burlington has been below the GPRA goal lead time in four of the five years analyzed (2015-2019) but performed better than the national goal for accuracy (Probability of Detection, POD) two out of the five years (Figure 1).

      Figure 1: Flash flood warning yearly lead time and accuracy (POD) trend from 2015 to 2019 (data from the Performance Management Web Portal)[click to enlarge]

      Results show that BTV has been below the national GPRA goal a portion of the time, while the national average is consistently above the accuracy goals. This may be in part to the relative lack of event frequency (Figure 2).

      Figure 2: Total number of flash flood events from 2015 to 2019 (data from the Performance Management Web Portal).[click to enlarge]

      To elaborate, 2017 and 2019 have more than ten events, but because the other years have very few events each missed event has a larger impact on overall performance score. As a consequence, very few events can negatively bias the results. Due to BTV’s low scores compared to the GPRA goals for the past five years, an improved modeling program may help forecasters better determine the likelihood of a flash flood event.

    2. Flash Flood Warning improvements in forecasting

      Several models and programs, such as Flash Flood Monitoring and Prediction (FFMP), Integrated Flood Observing and Warning System (IFLOWS) and Site-Specific Hydrologic Prediction Model (SSHPM) have been developed with the goal to better inform WFO forecasters of flash flood potential. The National Severe Storms Laboratory (NSSL) launched Flooded Locations And Simulated Hydrographs Project (FLASH) in early 2012. The primary goals of FLASH are to improve the accuracy and timing of flash flood warnings which will help to save lives and protect infrastructure [NOAA National Severe Storms Laboratory (n.d.)]. The overall focus of this study is to examine FLASH Maximum Unit Streamflow (MUS) data for warned and unwarned flash flood events to determine its utility in flash flood operations at NWS BTV. Presently, there are no current studies investigating FLASH’s utility in operations at BTV.


    1. How FLASH works

      FLASH is a relatively newly developed model that can be used for flash flood prediction for forecasters. FLASH is described as a suite of real-time tools that use weather radar-based rainfall estimates to force hydrologic models to predict flash floods. Explicitly, FLASH was designed within an Ensemble Framework for Flash Flood Forecasting (EF5) (Figure 3). The ensemble structure ingests multiple data sources from rainfall observations to high resolution Numerical Weather Prediction (NWP) forecasts [NOAA National Severe Storms Laboratory (n.d.)]. This framework also allows for multiple model structures, parameter settings, and newly developed techniques for yielding probabilistic outputs.

      Figure 3: Flowchart describing the EF5 system configured for the FLASH project. The red boxes indicate model forcings, blue boxes are model physics modules, and the purple box contains output products, adapted from [Gourley et al. (2017)].[click to enlarge]

      FLASH ingests several inputs to balance the water budget in a catchment such as precipitation forcings and module physics parameters (water balance models) (Figure 3) [Gourley et al. (2017)]. Precipitation forcing data are utilized for estimates of forecast precipitation locations, total amounts, and types, while water balance models such as Coupled Routing and Excess STorage (CREST) are used to aid in routing overland flow.

    2. Importance of analysis parameters

      Overall, FLASH ingests a variety of rainfall and discharge parameters because each provides valuable information and products for flash flooding forecasting. The focus of this study is the FLASH parameter Maximum Unit Streamflow (MUS) because it is one of the most useful products since it highlights areas experiencing higher than normal flow. MUS is defined as simulated surface water flows normalized by drainage area (between 30 min before and 12 hr after the valid time) with units of m3∙s-1∙km-2. The motivation for this study is to determine the minimum, median, and maximum ranges of MUS data for flash flood warned events, establish thresholds above which flash flooding becomes more likely, and to determine if the data is useful in areas of poor radar quality—further referred to as beam blocked regions. For BTV’s CWA, this includes most of eastern VT and the Adirondack Mountains where the radar’s lowest scans are blocked by terrain for the KCXX radar in Colchester, VT (Figure 4).

      Figure 4: Beam blockage for KBTV’s KCXX radar at 0.5° (a) and 1.5° (b) base reflectivity highlighted via yellow polygons. The CWA is outlined in white; the counties are outline in green, NY flash flood events are in orange points, and VT Flash floods are in red p points. Data are from NCEI Storm Events Database.[click to enlarge]


    1. Data Collection

      The focus of this study is Flash Flood events as opposed to Areal Flood or River Flood events. This study also did not examine antecedent conditions such as prior 24-hour rainfall or soil moisture. Events were collected from the NWS Performance Management Web Portal (PMWP) which is used for NWS Storm Based Warning Verification. PMWP ingests warnings from the Advanced Weather Interactive Processing System (AWIPS) dissemination for verification. The PMWP has official verification from each office verification focal point who looks at the available data (local storm reports, local news, social media, and other reports) and then determines if the warning is verified. PMWP also provides verification numbers for the warnings issued. This study analyzes data from June 1st, 2015 to December 31st, 2018. The time frame was selected based on available MUS data.

    2. Data Analysis

      All events warned and unwarned were analyzed for the Probability of Detection (POD), False Alarm Ratio (FAR), Critical Success Index (CSI) and lead time which were all collected from the NWS’s PMWP. Fundamentally, each score helps to determine if the forecaster is successful in properly warning events. For example, POD essentially gages how well the forecaster is predicting hazardous events since POD is the average of all Percent of Event Warned (PEW) where every point along the path of the event is assigned warned (=1) or unwarned (=0). The FAR is important in weather forecasting as forecasters seek the balance between over-reporting possible dangers versus the hazard of not making the alarm when the actual risk will happen. Low false alarm ratios are always preferred, but not at the cost of under-reporting actual danger. FAR is calculated by dividing the number of false alarms by the total number of warnings. The CSI is another index that takes into account both false alarms and missed events and is, therefore, a more balanced score. However, the CSI is somewhat sensitive to the climatology of the event, tending to give poorer scores for more uncommon events such as flash floods. With BTV having a limited number of FF, it suggests another reason not to focus on CSI. Thus, POD will be the focus of this study.

      Every FF was visually inspected and then analyzed for the maximum, mean, and minimum MUS. Each FF was analyzed before, during, and after to determine peak MUS starting at time zero (when FFW was issued). The MUS pixels within the warning boundary (or unwarned) were averaged to determine a mean value of the event (Figure 5). The types of warnings were analyzed (verified, not verified, or missed) to determine if there was a relationship between the mean MUS and if the FF was warned. Events were analyzed for their proximity to beam blocked areas, such as most of eastern Vermont. Finally, the threshold of MUS above which flash flooding becomes likely was determined.

      Figure 5: Represents the verified FF event on 07/24/2017 starting at 13:15 UTC. (a) the red polygon represents the warning area and blue polygon represents the verified event area (data from the Performance Management Web Portal (b) represents the MUS pixels and their corresponding MUS value during the event (data from FLASH).[click to enlarge]


    1. Performance Management Web Portal

      Results of the POD, FAR, CSI, and lead time from PMWP are in (Table 1). POD is about 85% with a FAR of only 34%. The average PEW is 96% which means for the warnings that were considered “hits”, 96% of the observed flooding occurred inside the warning polygon. This is important because it shows that the warnings (for verified events) cover over 90% of the event, ensuring people and property within the event are likely warned. The contingency table of events forecasted vs events observed is in (Table 2) along with the pie chart for warning and event counts in (Figure 6).

      Table 1: Flash Flood Warning Verification Summary Statistics, colors are associated with event and warning type (data from the Performance Management Web Portal).[click to enlarge]

      Table 2: Contingency table for flash flood events warned and observed; Colors are associated with event and warning type (data from the Performance Management Web Portal).[click to enlarge]

      Figure 6: Pie charts representing flash flood warning verification summary statistics for warnings and events, colors are associated with event and warning type (data from the Performance Management Web Portal).[click to enlarge]

    2. FLASH

      Finally, the MUS for verified, unverified, and missed events in Burlington, VT CWA (06/01/2015-12/31/2018) are in (Figure 7). Overall, the median and maximum MUS values for verified events are larger than both the unverified and missed events. The missed events have the smallest range for both maximum and medium in MUS value, while the unverified events (maximum) have the largest MUS range. Furthermore, based on (Figure 7) there is considerable overlap between the max verified and max unverified flash flood events, but very little overlap between the verified and unverified medians.

      Figure 7: MUS for verified, unverified, and missed events colors are associated with event and warning type. Boldface and italicized represent the median, italicized and "x" refer to the mean, N signifies the number of events for event type, boxes 25th-75th percentile, whiskers 10th and 90th percentile.[click to enlarge]


    1. Beam blockage

      One of the major issues in BTV’s CWA is beam blockage and the four missed events alone accounted for $1.745 million in damage (USD). The MUS for missed events was on average smaller than the verified events (Figure 7). The POD is 0.85 (Table 1), which indicates that MUS utility may further improve BTV’s score although it is already relatively high for an office with significant beam blockage in most of its CWA. However, to determine if the missed events are related to beam blockage, each missed event was visualized against the 0.5° (a) and 1.5°(b) (Figure 4) base reflectivity at the KCXX radar. Of the missed events, 75% (three out of the four missed events) are within the 0.5° base reflectivity beam blockage while only 25% (one out of four missed events) are beam blocked in 1.5° base reflectivity. Therefore, it could be argued that the missed events were largely due to the lack of good precipitation estimates in these areas. To summarize, most of the events missed are beam blocked in the 0.5° base reflectivity, but not the 1.5° base reflectivity. Overall, these results suggest that beam blockage may not be the only contributing factor to why events are missed. To elaborate, it is generally harder for a forecaster to determine if flash flooding will occur due to variability and steepness of the terrain, the event’s temporal length, or precipitation intensity. This is seen in the small MUS values for the missed events (Figure 7).

    2. MUS utility in BTV operations

      Finally, the threshold of MUS above which flash flooding becomes likely was determined to be around 3 m3∙s-1∙km-2. This was done by averaging the maximum MUS for the verified and missed events since both are determined to be flash floods. If the missed and verified are analyzed separately, the average MUS for missed events is only 2 m3∙s-1∙km-2, while the average maximum MUS for the verified events is 4 m3∙s-1∙km-2. The maximum values for each event were used instead of the median because the median for most events was 0.5 m3∙s-1∙km-2 which is a small threshold and likely occurs during typical large precipitation events; thus, it would not be indicative of flash flooding. Nevertheless, these results indicate the large gap in knowledge for flash flooding due to model limitations and long-term data availability.

    3. Further Research

      It should be noted that the study period is extremely short, and it is recommended that a dataset longer than 3 years with a larger sample size (only 36 events) be used to determine a more statistically significant result. MUS highlights areas of higher than normal flow, but this does not always signify a flash flood will occur as noted with the median MUS value for all events as low as 0.5 m3∙s-1∙km-2. This is also shown in (Figure 7) where the max unverified events and max verified events largely overlap, but the medians do not. This could signify that although there are several pixels of high MUS values in the unverified events that most of the event does not have “higher” MUS values, thus FF did not occur. Moreover, this additional water (therefore higher median MUS values) throughout the entire verified event is a determining factor that makes the event pass a “threshold” and become a FF. Additionally, the prior catchment soil conditions (saturated or unsaturated), and the duration and intensity of precipitation events are all factors that can better determine if flash flooding is likely in BTV’s CWA given the MUS values for missed events are the smallest.


    One of the major issues in BTV’s CWA is radar beam blockage for general forecasting and warning operations. This can pose a significant issue for flash flood prediction due to potential for poor precipitation estimates. The overall performance of BTV flash flood warning operations is satisfactory based on a POD of 0.85 (Table 1). However, GPRA lead time goals are generally not met and year-to-year accuracy percentages are highly variable (Figure 1). The latter can be partially explained by the lower number of events over the 3-year period of this study. Some recommendations for improvement are additional training, reviewing past events, supplemental research, and increased communication with other offices in ER regarding their flash flood forecasting methods.

    Despite the radar being beam blocked in regions of eastern VT and the Adirondacks, BTV’s POD for flash flood warnings is relatively high and the missed events could be due to other factors such as antecedent soil conditions, terrain type and steepness, which all play a large role in FF events. Another possibility is that the events could be missed to their generally lower MUS values (if the warning operator was looking at the data). However, the data from FLASH still informs forecasters of the likelihood of flooding with a maximum MUS value of around 3 m3∙s-1∙km-2. With continued validation and increased long term data availability, MUS has great potential to be utilized in flash flood operations at NWS BTV.


Bilder, M., and Johnson, E., 2012: Appendix B, State of The National Weather Service, Commission on the Weather and Climate Enterprises Summer Community Meeting discussing the State of the Enterprise.

Gourley, J.J., and Coauthors, 2017: The FLASH Project: Improving the Tools for Flash Flood Monitoring and Prediction across the United States. Bull. Amer. Meteor. Soc.,98, 361–372,

Hardy, J., 2019: Choosing Your Precipitation and Guidance Sources. NWS Warning Decision Training Division. Commerce Learning Center.

Koren, V., J. Schaake, Q. Duan, M. Smith, and S. Cong, 1998: PET Upgrades to NWSRFS, Project Plan. Unpublished Report.

NOAA, 2002: 3-SAC-SMA Conceptualization of The Sacramento Soil Moisture Accounting Model, NWS. Accessed February 28 2020, rfs: 23sacsma.wpd.

Mohammed, K., Islam, A.K.M. and Khan, M.J.U., 2017: Flash Flood Forecasting in the Northeast Region of Bangladesh Using Artificial Neural Network.

NOAA, 2012: Hydrologic Services Program, Definitions and General Terminology. National Weather Service Manual 10-950, 5 pp.

NOAA National Severe Storms Laboratory, n.d.: “Flooding.” Accessed February 02 2020,

Performance Management Web Portal, 2019. The Performance & Evaluation Branch in the Office of Chief Operating Office (OCOO).

Wang, J., and Coauthors, 2011: The coupled routing and excess storage (CREST) distributed hydrological model. Hydrol. Sci. J. 56(1), 84–98,