Statistical Properties of Electricity Generation from a Large System of Wind Plants and Demand for Fast Regulation

Experimental data of total wind generation, recorded at 5 minute intervals and published by the Bonneville Power Administration for the years 2007 to 2013, were analyzed on a year by year basis. All data were normalized to total installed power of wind plants. Statistical distribution functions were obtained for the following wind generation-related quantities: total generation as percentage of total installed capacity; change in total generation power in 5, 10, 15, 20, 25, 30, 45, and 60 minutes as percentage of total installed capacity; duration of intervals with total generated power, expressed as percentage of total installed capacity, lower than certain pre-specified level. The statistical distributions obtained from the data were used to devise simple, yet accurate, theoretical models. The models presented here can be utilized in analyses related to power system economics/policy, because they describe availability of wind energy resource in simple statistical terms relevant to interactions of wind generation with electricity system, and electricity markets. After a brief display of the models, the article concentrates on static properties of the observed system’s electricity generation related to its capacity credit, as well as on dynamic properties related to the demand for fast regulation (i.e., secondary and fast tertiary reserve). Both properties are important for technical planning of future electricity systems, as well as rational design of policy measures.


INTRODUCTION
One of most notable technical problems with massive integration of renewable sources of electricity into the existing power grid comes from the fact that most favourable technologies, i.e. wind and solar, exhibit substantial production variability on all time horizons.The power system operation is mostly concerned with not-fully-predictable short-term variations in generation of power, which occur within several minutes.Technical challenges are always complemented by economic ones.Integration of renewables does cost, and the costs of variability regulation do rise with installed power.For these reasons, as well as a number of other ones, it is very important to learn about the nature of wind phenomena and associated electricity generation, which naturally follows the intensity of wind at a plant's location.To give some examples of the importance of acquiring a knowledge of wind generation statistics, which is not connected (merely) to cost aspects of short-term balancing, but to the security of grid operation, one can point to e.g., Morel et al. [1], who introduced a strategy to improve the transient stability of a power system with a high-penetration of intermittent renewable sources.The stability degradation arises mainly due to the highly variable outputs of these types of generators, during normal and abnormal operations.
Next, Ji et al. [2] analyze an aspect of influence of wind power variability on unit commitment problem and solving it using Gravitational Search Algorithm.Rotich et al. [3] discuss wind resource assessment and forecast planning with neural networks trained with real wind speed data.Song, Jiang and Zijun [4] discuss a Markow-switching model in wind speed forecasting, whereby utility-wide real wind speed data were used to develop the model and control its performance.Wan et al. [5] analyze probabilistic forecasting of wind power generation using extreme learning machine, with one-hour experimental time series of generation data from a real wind plant location in Australia to test the model.Zhang, Wang and Wang [6] give a review of methods of probabilistic forecasting of wind power generation, whereby accurate experimental data are needed both for training of certain types of algorithms, and for testing their performance.Jung and Broadwater [7] provide a similar type of review.Cannon et al. [8] deal with meteorological reanalysis data sets to address infrequent phenomena of extreme situations with wind production variability.In this paper, our analysis does not pertain to extreme realizations of stochastic processes of wind power generation in particular, but does point to the significance of distribution tails.
On the side of economic and social optimization of power system development with increased variability of renewables, there is also a body of literature that points to the importance of temporal analysis of wind generation.Wen et al. [9] investigate optimal economic allocation of energy storage systems considering wind power distribution.Zakariazadeh, Jadid and Siano [10] investigate optimal economic-environmental energy and reserve scheduling of smart distribution systems using a mathematical programming approach, where the authors model wind speed with Rayleigh distribution.Mostafaeipour et al. [11] provide an analysis of wind energy potential and economic evaluation in Zahedan, Iran, whereby they use Weibull distribution to model the wind variability, and consequently estimate Weibull distribution parameters from measured wind speed data.Sun et al. [12] propose a random fuzzy model to express probabilistic, and possibilistic, uncertainties of wind speed on adequacy of a power generation system, where wind is assumed to be Weibull-distributed.In a review by Chauhan and Saini [13], on integrated renewable energy system based power generation for stand-alone applications, the wind part of the system is modeled using Weibull distribution.In the analysis of integrated scheduling of renewable generation and demand response programs in a microgrid, Mazidi et al. [14] make use of Rayleigh distribution to model the wind part of the generation system.
Adaramola, Agelin-Chaab and Paul [15] use measured wind speed data along the coast of Ghana to derive parameters of theoretical Weibull distribution of speeds.Wang et al. [16] review and analyze estimation methods for offshore extreme wind speed statistical parameters and wind energy resources.Khahro et al. [17] use measured data of wind speeds from a site in Pakistan to derive Rayleigh, or Weibull, distribution parameters for the location.Pishgar-Komleh, Keyhani and Sefeedpari [18] provide a similar type of analysis, for a geographic region in Iran.Shu, Li and Chen [19] give analysis of measured wind speed data at five weather stations in Hong Kong in one-minute temporal resolution, and estimations of Weibull distributions associated with the observed phenomena.Ouarda et al. [20] provide comparative analysis of goodness-of-fit of different theoretical statistical distributions to actual measured wind speed data, to conclude that the two-parameter Weibull model performs the best.Fazelpour et al. [21] also provide the hourly, diurnal, seasonal, monthly, and annual wind speed variations, and employ a Weibull statistical model of wind speed.

Research problem
When studying temporal characteristics of wind generation, one can take two possible generic courses: to examine wind speed features and convert them into power production ones by technical analysis, or to observe directly power production as a function of time.Both approaches are legitimate and have their justifications; however, the former is more frequently encountered in the literature.As the directly measured power generation data have been less analyzed up to now, it is useful to observe and describe generation from a fairly large wind plant system.
The primary research goal was to obtain simple theoretic distribution functions to model actual statistical distributions of wind generation-related phenomena with sufficient accuracy, so that they can be used for various analyses in the fields of either power system operation, or perhaps power system economics/policy.The quantities of interest in this research were: total power generated within a (relatively) large system, and short-term temporal changes in generated power.They are important because they require flexible generation capacities, usually provided by conventional generating units (e.g.hydro plants with water reservoirs, or gas-fired plants), to be kept aside energy market in order to be able provide physical balance of the intermittency introduced by certain renewable sources, like wind, or solar.
Another research goal relates to the fact that, often, publicly available electricity generation time-series contain data recorded in one hour intervals, which is too rough for analyses of demand for ancillary services, especially secondary and fast tertiary regulation, caused by intra-hour generation variability.Thus, the question whether one can estimate the statistical parameters of, say, 15 minute power variations, based on available 60 minute data, was found worth investigating.
For discussion of various methods for capacity credit calculation, see [22].The present paper, deals with a parameter simply called "intrinsic capacity", defined as a percentage of the total installed power of a group of generation facilities which can be counted on with certain default risk.
Next, the following statistical measures to describe the dynamic behavior of wind generation systems on the longer-term scale (i.e., in time periods from several hours up) shall be defined: • The level-crossing rate at certain pre-specified level: the number of times the generation power drops below a certain pre-defined level in a year; • Distribution of duration of intervals with total generated power, expressed as percentage of total installed capacity, lower than a certain pre-specified level: The level-crossing rate itself does not sufficiently describe longer-term dynamics of intermittent generation.However, at the same pre-specified level, one can determine the number of intervals during which the total power output is below that level, as well as the duration of each of these intervals.Therefore, one can compute their statistical distribution, as well as other descriptive statistical parameters, like expectation, variance, etc.

Structure of data
All the data analyzed in this research were taken from the web page of the Bonneville Power Administration (BPA), a federal nonprofit agency, the power system operator, based in the Pacific Northwest region of the United States of America.The BPA makes them freely available to general public.For more information on BPA, visit http://www.bpa.gov.The Excel files with historic data of wind generation in 5-minute increments from 2007 on are available at the address: http://transmission.bpa.gov/business/operations/wind/.At the same address, a number of other data sets are also available but for the purpose of this research, only the wind generation data were used.
The system of wind generation plants under jurisdiction of the BPA from was growing ever since the first plant was put in operation during 1998.As from the beginning of 2007, the system grew to 4,515 MW.
For reader's convenience, Table 1 gives a brief overview of these data.In this research, the detailed schedule was used to normalize the wind generation power to the total installed capacity.Most of the contributing wind plants are situated in a relatively small mountain area of, roughly, 50 × 60 kilometers, except for several plants situated at somewhat greater distance.An analysis to see whether the parameters of statistical distributions obtained were somehow related to changes in spatial distribution of the plants over time was performed.No apparent correlations were found.

METHODS
The basis for statistical analyses presented in this paper was a time series of wind generation power, measured in 5 minute increments during the years 2007 to 2013.These readings were normalized to total installed power of wind plants in the same area, as described in previous sections.Consequently, all the data prepared in this way belonged to an interval [0, 100%].Statistical analyses were performed on a year-by-year basis.Therefore, in any given year the total number of experimentally obtained data points was equal to N = (12 intervals per hour × 24 hours × 365 days)/year = 105,120 readings per year (or 105,408 for a leap year).
Statistical distributions of experimental data were estimated in the following way: The domain containing the values between experimentally established minimum and maximum of the population was divided to 100 equally spaced bins.The membership of each bin was counted to establish the frequency distribution, as well as the cumulative frequency distribution, of the population.By dividing the two by N, the distribution density function, as well as the probability distribution function, were obtained.
The formulae presented in Table 2 were derived in the following way: after a careful inspection of the experimentally obtained distribution functions, based on author's experience in data fitting, subsequent try-and-error selections of candidate theoretical models (the formulae from Table 2, see next section) that could reproduce the experimental distributions sufficiently well were performed.The models found in this way were then subjected to least-squares optimization computer routines.
Once the parameters of optimal distribution were estimated, goodness-of-fit assessments between experimental and theoretical distribution functions were carried out.Given a large number of individual readings, N > 100k, and a relatively small number of degrees of freedom (typically, 97) as compared to N, the Chi-square goodness-of-fit test could not be passed with a sufficient statistical significance in any of the cases.As the Chi-square test [23] depends heavily on the size of population, experimentally obtained distribution density functions would have to be virtually ideally slick to pass that kind of test.Yet, this was not the case.Instead, an approach based on distance between empirical and theoretical probability distribution functions, that is, the well known Kolmogorov-Smirnov test, was taken [24].
Unlike other distribution functions, which were evaluated in 100 points each, the distribution functions for duration of intervals with total generated power lower than a certain pre-specified level were evaluated in 30 points (i.e., the interval durations were distributed in 30 bins) because the total number of data points was merely between 300 and 400, and it was different for each year and each pre-specified level.

RESULTS
For the sake of brevity, statistical distributions derived from raw data in this research, as well as other quantitative relations, are stated in Table 2. Methods of testing goodness-of-fit of the distributions will not be presented here, although some results will be displayed.The details of methods used can be found in [25], or obtained from the corresponding author upon request.The raw data used for analyses can be obtained from the authors, too.
Critical values for these tests depend, however, on the type of probability distribution.Therefore, the simplest possible approach was taken: the empirical distribution was regarded as fitting well enough to a theoretical one if the absolute value of Kolmogorov-Smirnov test statistic was 0.02 or less [26].The critical values for Kolmogorov-Smirnov test can also be found in [26].For completely specified continuous distributions and statistical significance level of 1α, with number of bins, b, larger or equal to 35, these values can be estimated by applying the formula: [−0.5 × ln(α/2)/b] 0.5 .They were used as a proxy to assess if it was reasonable to regard our 0.02 criterion adequate for evaluation of goodness-of-fit.For 1α = 0.01 and b = 100, the critical value is 0.163.For 1α = 0.01 and b = 35 the critical value is 0.269.Obviously, the 0.02 "critical value" is much more stringent than these, and despite the fact that it was not possible to perform a fully rigorous Kolmogorov-Smirnov analysis, it is safe to assume no make mistake was made by inferring that an empirical curve which stays within ±0.02 limit off of a theoretical one was well enough modeled by the latter.
The results are presented in Figures 1 to 20, and Tables 3 to 8, and brief explanations for them are listed systematically in Table 2, which is to be read together with methodical explanations of this and the previous section.
Table 3 Figure 3 gives the Kolmogorov-Smirnov test statistic values for the whole domain of values (0 to 100% of total installed wind plant power).The ±0.02 borders are also shown.Note that the models are generally better in the lower, and the middle, part of the distribution function domain, while they do not reproduce the upper tail very well.For practical purposes, the distribution function for values of generation near the 100% of total installed capacity may not be so important because the system reliability is usually not jeopardized when there is an abundance of energy in it.Thus, we can concentrate on the lower tail of the distribution.We found the same type of model is appropriate.The corresponding parameters are listed for the years 2007 to 2013 in Table 3 as well.Figure 4 gives the 2012 distribution (both measured and theoretic) as an example.The goodness-of-fit analysis indicated a very good fit (Figure 5).

2.
r is cumulated time normalized to 1 year, thus, r ∈ [0, 1] VaR(r) = e -B/A r 1/A Figure 7 Variable r is defined on a domain [0, 1], and represents the total cumulated time, normalized to one year, during which the total generated power stays below certain level, p, also normalized to the total installed wind plant power.In other words, p(r) is a Value-at-Risk (VaR) function, where p is the value, and r is the risk.Figure 7 gives as an example of the 2012 VaR(r).For small values of p, a percentage of the total installed power of a group of generation facilities which can be counted on up to a certain default risk, r, increases very slowly with r.That is, for small p, VaR increases slowly with risk.Note that this property differs from many familiar VaR functions, which is not favourable as regards capacity credit.

3.
Module of change in ge-neration po-wer, normali-zed to the total installed wind plant power Short-term temporal changes in generation power are a very important feature of any generation sub-system with significant intermittency because they set demand for secondary and fast tertiary regulation.Variable X was defined as module of change in generation power, normalized to the total installed wind plant power.The range of X values was an interval [0, 100%].Statistical distributions were derived for 5, 10, 15, 20, 25, 30, 45, and 60 minute changes.

4.
Tk -interval of k minutes, k ∈ {5, 10, 15, 20, 25, 30, 45, 60}  Additional relations between A and B parameters (see row 3 of this Table ).The variable Tk has a very simple meaning: it is merely a time interval equal to k minutes.For instance, T30 = 30 min.Figure 13 displays natural logarithm of the ratio A/B as a function of natural logarithm of the ratio between relevant time interval, Tk, and the 5-minute interval, T5, for the years 2007 to 2013. Figure 14 shows the experimentally obtained relation between ln(Ak/A60) and ln(Tk/T60) for each year 2007-2013.The second-order polynomial regression was used to fit the data.All of the regression functions were forced to run through (0, 0), which resulted in only two parameters for each year, P and Q. Figure 15 shows the experimentally obtained relation between ln(Bk/B60) and ln(Tk/T60) over the same period.Obviously, the linear regression functions (not shown for the sake of clarity), specified for each year, can fit the data very accurately.Just like in the A-parameter case, these regressions must be forced to run through (0, 0), so that they all have merely one parameter per year, R. Table 5 lists the P, Q, and R parameters, as well as A15/A60 and B15/B60 factors, for the years 2007 to 2013.It also gives the averages and standard deviations of these quantities.P and Q can vary considerably over the years.However, applied mathematical operations largely dump these variations, so that the resulting ratio A15/A60 shows a mild standard deviation of 6.45%, as compared to the average value.The R parameter does not vary much with time, so that the resulting ratio B15/B60 also shows a mild standard deviation of 5.31% relative to the average value.To conclude, if only the 60 minute incremental data are known, one still can estimate the 15 minute distribution parameters, A15 and B15, by taking about 85% of A60, and about 36% of B60, respectively.More generally, parameters of statistical distributions of 15 minute (or any other k minute) changes can be estimated relatively precisely from 60 minute parameters.See more details in [25].

5.
MReg, "regulation multiplier" Conceptually, MReg shows how many times can total installed power of wind plants exceed available 15 minute flexible regulation reserve, if the system operator deems default risk of r percent acceptable, nothing else being variable in the power system.Because of inherent variability of load, other generating plants, and cross border exchange, and because regulation reserve may not always be available in its maximum value, in the reality the multiplier would be somewhat smaller.Here we give only formula for small values of r.More details in [25].

6.
Duration of intervals with total generated power, expressed as percentage of total installed capacity, lower than certain pre-specified level  The concepts were borrowed from radio engineering, where electromagnetic field strength at the receiving antenna, when a signal is transmitted over time-varying channels, exhibits changes in time, from rather slow ones, to very fast, and anything in between [27].Figure 18 explains the definition of level-crossing rate: if the level L is crossed downwards RL times per year, the level-crossing rate equals RL per year.This quantity is an indicator of the wind generation volatility.It is a one single number assigned to a whole year.Table 6 gives the RL figures for the years 2007-2013.Figure 19 gives ten cumulative distribution functions for ten pre-specified levels L, for the year 2013, while Figure 20, for the sake of clarity, shows only one of these curves (that for the level of 5%) together with the corresponding theoretic curve z(x).Table 7 contains the A parameters calculated for the years 2007-2013, and for each of the ten levels.
Table 8 gives the B parameters of the same distributions.

Nomenclature of the presented results
In subsequent sections of this article, to keep the number of different mathematical symbols reasonably low for reader's convenience, the following simple notation rules will be used for all the quantities dealt with: • The random variable dealt with will be marked with X; • The probability density function of that variable will be marked with y(x), and the cumulative probability distribution function will be marked with z(x).It always applies: y(x) = dz(x)/dx; • The distribution parameters will be marked with capital Latin letters, e.g.A, B, C, etc.; • For all other mathematical symbols, lower case Latin or Greek lettering will be used; • Thus, all the different quantities will be presented using the same basic x-y notation.

DISCUSSION
In this work historical data on wind electricity generation in the BPA region were analyzed, in order to devise simple, yet accurate, statistical models for the following quantities: • Distributions of total generated power expressed as percentage of total installed capacity; • Distributions of short-term temporal changes in total generated power expressed as percentage of total installed capacity; • Level-crossing rates and distributions of duration of intervals with total generated power, expressed as percentage of total installed capacity, lower than a certain pre-specified level.Additionally, VaR functions for what can be called "intrinsic capacity" of the wind plant system (the generation power, expressed as percentage of total installed wind plant capacity, that can be counted on with certain default risk) were derived, as well as for "regulation multiplier", a quantity that shows how many times, given certain acceptable risk of fast regulation reserve getting short, it can be exceeded by total installed wind plant capacity, provided nothing else in the system is variable.
In each of the above mentioned cases we obtained very simple statistical models and proved in a rigorous manner that they model the associated quantities correctly.
An approach that was used in this research was to treat a relatively large wind plant (sub)system as a "black box" and to analyze statistical properties of large, very finely quantized, time series of measured generating power.The main research goal was to devise simple closed-form statistical models of several stochastic variables of interest, in order to get as simple descriptions of "the nature" as possible, in terms of quantities needed for economic and/or technical analyses.
The "regulation multiplier" concept (see Table 2, row No. 5) was devised in the first place to explore some very basic properties of "the nature" regarding necessary fast regulation reserves that the system operators must keep aside the energy markets in order to balance inherently variable generation.As already mentioned, the MReg quantity, as defined here, is not absolutely accurate.Yet, it gives a good insight into the nature of the problem.By "fast reserve" we understood any spinning capacity reserve that can be fully engaged in 15 minutes, regardless of flexible generation technology employed.In real electricity systems, capacities engaged in such way are being gradually released by engaging slower flexible generation (e.g.tertiary regulation reserve), so to make the fastest flexible generators available again.
The idea of our research was that 15 minute changes in total generated power should be matched with the reserve up to a certain probability of default.This risk comes from the fact that 15 minute changes have certain statistical properties, which can be, as in the presented research, normalized to total installed power.Therefore, if the available 15 minute reserve was MReg(r) times smaller than total installed power, it would be sufficient to balance the generation variability, except during r percent of total time.A practical problem with such an approach is that only the variability of wind generation was taken into account.In reality, load also varies and other generators, although being more predictable, may experience e.g.technical faults, so that they are also generally variable.As variances of mutually independent stochastic variables (such as wind generation and load, especially in short time) add, obviously, the regulation multiplier would in reality be smaller.
Moreover, the reserved 15 minute capacity may not always be fully available as tertiary regulation cannot act instantly, so that realistic regulation multiplier may even be smaller than that.Next, the 15 minute changes in generation power are merely a proxy for "regulation demand", used here to illustrate the basic relation between acceptable default risk and regulation multiplier.The physical balancing of the system is of a more complex nature.Next, in this calculation we ignored ever-improving possibilities of forecasting of wind generation, which can release regulation demand to a certain degree.However, it is still questionable whether the changes in wind speed can be forecasted across wide territories in a time-coherent manner with usable precision.If not, as regards impact on demand for regulation, imprecise and incoherent forecasts of fast changes are as good as none.As this particular point was not studied here, definitive conclusions on it are omitted.
Finally, cross-border interconnections may help a little in case the domestic power system runs short of fast secondary regulation for a short period of time, as neighboring systems may provide missing regulation if they have some free at the moment.Yet, should every country pursue rapid deployment of renewable sources integration, the possibilities for such assistance with natural energy exchanges will inevitably diminish with time, not to mention system operation rules, requiring each power system operator to keep its control area balanced all the time.
One of the other notable features of the presented research is the connection between distribution parameters of changes in generation power over different (short) time periods.This is an interesting point because measurements with fine time resolution are generally not easily available on public web sites.However, data with one-hour resolution can be found much more easily.Therefore, a possibility to credibly estimate, say, 15 minute changes from the 60 minute data can lead to considerable growth of experimental data usable for analyses of demand for regulation generated by intermittent wind plants.
In addition, in this research note we introduced new statistics that may prove important for analyses concerned with time variability of wind power production on all time horizons: level-crossing rate, and distribution of intervals with momentary generation power lower than a certain level.Naturally, in order to make use of the statistical distributions and other quantities discussed in this article, it would be good to have more systems from different parts of the world analyzed, to compare the results.

CONCLUSIONS
The models established in this research, which emulate processes of natural production of electricity in a wind plant system, are important for sound policy making as they enable modelers and planners to simulate economic effects that intermittent wind generation exerts on electricity markets, both in the short and long run, but also on electricity system in technical terms.
The data gathered from wind plants, only, were analyzed.One could have taken into account other parts of the power system, too, but it was decided to go with the "pure" wind generation data, to study the wind plant system as a separate component.In more distant future, once the renewables assume dominating shares in total energy generation, this kind of approach may prove quite important, as time variability of power production will become one of the biggest economic and technical challenges utilities will have to be dealing with.The problem of compensation of time variability of power sources is, at least in the authors' minds, somewhat underestimated these days.The policy making regarding renewables is sometimes heavily influenced by both industry interests, and ideology.Therefore, it is important to construct easy-to-use and easy-to-understand modeling tools, to bring more rationality into the debate.As much as a rapid greenification of the power system is needed, one of the biggest threats to this process is to keep leading it in an economically non-sustainable way.

FUTURE RESEARCH
The analysis presented here can be completed and improved in several ways.First, it would be useful to analyze generation from other bigger wind plant systems, perhaps those with greater spatial diversity, to see whether it significantly influences the shape of statistical distributions.(Naturally, the parameters of distributions are expected to change, but one should observe what, if anything, happens to the very types of them.)Then, analyses which take account of other variable parts of the power system, as well, would complement the results displayed here.As the wind generation is not the only variable part of the system, it would be useful to determine how a statistical interaction with other variable parts leads to capacity credit at various levels of wind penetration.The concept of MReg should also be deepened, i.e. modeled with more detail, because, as concluded in the Discussion section, under more realistic conditions MReg would turn out to be somewhat smaller.

Table 2 .
Statistical distributions, and other quantitative relations, derived in this research

Table 4 .
A and B parameters of theoretic probability distribution functions (see Table2, row No. 3) for changes of generation power in intervals of k minutes, k ∈ {5, 10, 15, 20, 25, 30, 45, 60}, and for the years 2007-2013, the changes of generation power are expressed as percentages of total installed wind capacity

Table 5 .
The P, Q, and R parameters (seeTable 2, row No. 4) extracted from the experimental data, for the years 2007 to 2013, together with factors that transpose 60 minute distribution parameters A and B to 15 minute ones, the factor A15/A60 equals 0.25 Q e -1.9218P , while the factor B15/B60 equals 0.25 R , the table displays average values, as well as standard deviations normalized to average, of the aforementioned quantities, too

Table 7 .
Parameters A of probability distributions (Table2, row No. 6) of duration of intervals with total generated power, expressed as percentage of total installed capacity, lower than a pre-defined level, for the years 2007-2013, and for each of the ten pre-specified levels

Table 8 .
Parameters B of probability distributions (Table2, row No. 6) of duration of intervals with total generated power, expressed as percentage of total installed capacity, lower than a pre-defined level, for the years 2007-2013, and for each of the ten pre-specified levels