Prediction of Global Solar Radiation in India using Artificial Neural Network

Increasing global warming and decreasing fossil fuel reserves have necessitated the use of renewable energy resources like solar energy in India. To maximize returns on a solar farm, it has to be set up at a place with high solar radiation. The solar radiation values are available only for a small number of places and must be interpolated for the rest. This paper utilizes Artificial Neural Network (ANN) in interpolation, by obtaining a function with input as combinations of 7 geographical and meteorological parameters affecting radiation, and output as Global Solar Radiation (GSR). Data considered was of past 9 years for 13 Indian cities. Low values of error and high values of coefficient of determination thus obtained, verified that the results were accurate in terms of the original solar radiation data known. Thus, ANN can be used to interpolate the solar radiation for the places of interest depending on the availability of the data.


INTRODUCTION
Many countries in the world have shifted to environment friendly alternative energy resources like solar, wind, hydro, biomass, waves, etc., because of the increasing negative effects of fossil fuels.These alternate resources are renewable and can sustain the increasing energy demand [1].Among them solar energy is most common and is mostly harnessed at solar farms.Moreover, solar radiation is an important aspect related to environmental and ecological studies and its data is needed for scientific research.
Being densely populated and a tropical country, India becomes an ideal destination for solar power utilization and thus, setting up of solar farms.The National Solar Mission (NSM) launched in January 2010 has given a great boost to the solar scenario in the country.To achieve the ambitious target of 2,000 MW off-grid and 22,000 MW gridconnected solar generation by 2022, it becomes imperative to identify solar hotspots in the country [2].For maximum return on farm, the place with high solar radiation is used for setting up of the farm.But the solar radiation values are available only for a limited number of places, and hence, must be interpolated to know the value for other places to get the best possible location for a solar farm.
From the beginning, empirical models were used to estimate solar radiation values.Angstrom [3] pioneered the method of empirical models which exploits the relationship between solar radiation and existing climatic parameters.The model was basic in nature and was widely used for solar radiation estimation.This model was then modified by Prescott to Angstrom-Prescott method [4].Trnka et al. [5] used this method to estimate solar radiation in Spain using sunshine hours as the input parameter.Though the study showed the utility of Angstrom-Prescott equation, the results obtained were not found to be close enough to the actual values.Similarly, Bahel et al. [6] developed a model using the sunshine hours as the input parameter.Bristow and Campbell model [7], Allen model [8] and Hargreaves model [9] related air temperature to solar radiation.The results showed that these simple models can also be used for a rough estimation of solar radiation, but are unable to provide significantly correct values to use in scientific research.Wu model [10] used maximum temperature and precipitation values as input.
Chen model [11] used sunshine duration and maximum temperature as the input parameters and showed better results than the former models.Still, this model was not accurate enough to be considered as the ultimate solution.The relationship in these models may get masked by the irregular fluctuations, increasing the uncertainty in predicting.These methods lacked the conviction to entirely capture the non-linearity characteristics demonstrated by the solar radiation, making them unable to achieve the desirable accuracy and thus, less reliable prediction models [12].
Some studies used the irradiance data provided by the satellites for solar radiation estimation [13].Though the geostationary satellite estimates have an advantage of large spatial coverage, the processing of its irradiance value data provides less accurate values compared to the ground measurements.Polo et al. [14] used meteosat satellite images for solar radiation estimates over India, which proved the above stated fact.Kriging method is the most common method used to process the satellite data.In the estimation of rainfall (Buytaert et al.) [15] and the temperature (Zhao Chuanyan et al.) [16], it has shown a considerable advantage compared to the deterministic interpolation procedures.For the solar radiation, Rehman and Ghori [17] used it for prediction in Saudi Arabia, while more recently, Erickin and Eyrendilek (2007) [18] used universal Kriging for daily global solar radiation in Turkey.
The results of these studies showed that the method is reliable in prediction of the spatial variability of the global solar radiation, but lacks the conviction of giving accurate values.In these studies, the Root Mean Squared Error (RMSE) values were high, and coefficient of determination (R 2 ) values showed a great fluctuation.This is probably because of the fact that Kriging method is a type of multiple linear regression method and thus, is unable to grasp the non-linearity in the solar radiation prediction.
Stochastic models were used in some studies to grasp the randomness present in the solar radiation value.This model considers the randomness present in the solar radiation value and does not stick to a definite direction for prediction.Even if the initial position is known, the process may evolve in several different directions.It offers a reliability of the predicted surface as it predict unknown values based on the spatial auto-correlation between data points by utilizing both analytical and statistical methods (Burrough and McDonnell) [19].Bechini et al. [20] implemented a type of stochastic model, known as Campbell-Donatelli model to predict the solar radiation values in Italy.Meza and Varas [21] used stochastic model on temperature difference for the prediction of solar radiation.These studies showed that the stochastic model proved significantly successful in providing an approximate value of solar radiation.
Use of artificial neural networks has been explored in developing prediction models for global solar radiation estimation.However, making such models is an intricate task in itself as it requires a number of input parameters such as latitude, longitude, altitude, sunshine duration, relative humidity, maximum temperature, etc., optimum number of neurons and a good transfer function to train the network model [22].Gupta and Kewalaramani [23] predicted the compressive strength of concrete using regression analysis as well as artificial neural networks and compared the results obtained.The results with Artificial Neural Network (ANN) was a better technique than regression analysis for prediction, and thus, shows the efficiency of ANN as a better prediction model in the case of non-linear data sets.
Elizondo et al. [24] first used ANN to estimate solar radiation in America.The results obtained showed its potential as a reliable method.Similarly, Al-Alawi & Al-Hinai [25] and Mohandes et al. [26] used it to predict solar radiation in Oman and Saudi Arabia respectively.These studies were at a nascent stage of development of ANN as a prediction model, but showed its utility as a model in future studies.Adnan et al. [27], Mubiru & Banda [28] and Fadare [29] used neural networks for predicting solar radiation in Turkey, Uganda and Nigeria respectively.The results obtained were compared to the actual solar radiation values and were found to be in order to them.Moreno et al. [30] used three different types of methods, namely, the Bristow-Campbell, ANN, and Kernel Ridge Regression, and compared the results obtained in each case.The results with ANN were better than with the other two methods.Similarly, Ali Rahimikhoob [31] did extensive research in the semi-arid fields of Iran and concluded that the results obtained for Global Solar Radiation (GSR) by the use of ANN are better than the results obtained by Hargreaves and Samani equation (HS).Some recent studies have been done for prediction in India using ANN.Sivamadhavi and Selvaraj [32] used data from three weather stations in Tamil Nadu, India, and the results obtained showed that ANN is a potential tool for predicting solar radiation in India.Krishnaiah et al. [33] used data from all over India and were successful in predicting solar radiation with accuracy.The previous studies proved the efficiency of ANN as a solar radiation prediction model, but they did not use other input parameters such as relative humidity.Hasni et al. [34] made neural networks as a function of air temperature and relative humidity data to predict solar radiation in the southwestern region of Algeria.The results obtained were compared with the solar radiation data collected.The radiation values calculated were significantly close to the desired value and thus, relative humidity can be seen as a potential input parameter for prediction.
Furthermore, past studies did not utilize the different combinations of the various input parameters.This study is done to analyze whether ANN can be used to predict GSR values even when there is a lack of data related to some of the input parameters.Therefore, along with the use of relative humidity as an input parameter, various combinations of input parameters have been used as input data sets.These combinations are given in Table 1.The results of the neural network thus prepared are shown and are evaluated separately.With the help of ANN, a function was created with input as the meteorological and geographical parameters and output as GSR.Multi layered feed forward neural network with back propagation as the learning method and Levenberg-Manquardt Algorithm (LMA) as the learning algorithm was used in the process.The data used was for 13 cities of India of the past 9 years.
The performance of these functions was calculated in terms of R 2 , RMSE and Relative Error (RE) values.Out of 13 places, 9 places were used for training the function and 4 were used to test the function.The radiation resulted in testing were compared to the actual radiation values of the places.

Data used
Past 9 years geographical data; latitude, longitude and altitude along with meteorological data; relative humidity at the 12 th hour, maximum temperature and sunshine duration were taken for 13 stations, namely, Jodhpur, Jaipur, Okha, Ahmedabad, Bhavnagar, Bhopal, Nagpur, Mumbai, Pune, New Delhi, Hyderabad, Goa and Jaisalmer.The data used was average data of 9 years for all the twelve months.Out of the 13 cities, 9 cities, namely, Jodhpur, Okha, Ahemdabad, Bhopal, Mumbai, Pune, New Delhi, Hyderabad, Goa were used for training of the network and other 4 cities namely, Jaipur, Bhavnagar, Nagpur and Jaisalmer were used for testing of the network.The test results were then stored and compared.All the input data were taken from the Indian Meteorological Department, Pune (IMD, Pune).
Latitude and longitude were taken in units of degrees.Altitude data were in meters.Sunshine hours were taken in hours while the temperature was taken in degree Celsius.Relative humidity was in percentage.Radiation values were in MJ/m 2 day.

METHOD
ANN is largely acclaimed and used because of its intelligence to learn from past experience.This is possible because of their fundamental micro processing units known as neurons.These neurons are similar to the neurons present in a human mind.And just as they can learn from their previous experiences in a human mind, neurons in ANN are equally capable to do so.As it is an artificially created neural network, it is called as an Artificial Neural Network (ANN).
ANN is a non-linear mapping between an input vector and an output vector, which happens without actually deriving an explicit equation.It is essentially a web of interconnected neurons, present as a layer in the network.There are three types of layers: input layer, consisting of inputs; hidden layer, consisting of a web of neurons; and the output layer; consisting of outputs.All these layers are different in function to each other.Details of a neuron is given in Figure 1.Neurons act as nodes in the layer and are connected by weights.These weights are a function of the sum of all the inputs, modified by a simple non-linear transfer function [35].A multilayer feed forward network (Figure 2) consists of numerous layers, which propagates the input signal in the forward direction.These layers do not have any lateral connections and neither they send back any feedbacks.Depending on the nature of training of the neural networks, ANNs can be classified into two broad categories: first, supervised neural networks and second, unsupervised neural networks.In the former one, the neurons analyze the input weights using the provided learning algorithm, which trains the neural network by using the provided desired output.Whereas in the latter, the neural networks are left on their own to analyze the input weights [36].In this study, a supervised neural network is used to obtain the function.In a supervised neural network, an algorithm is provided to train the network and weights are adjusted to get the desired relationship between the input and output.This step is known as training and is an important step in development of a neural network.Neural networks use back propagation method as the iterative method for its training.The algorithm consists of multi layered feed forward nets with tan-sigmoid nonlinear threshold units.For training, neural networks take data in the form of S pairs of input-output vectors to define the problem.These pairs are then used to adjust the weight of the connections between the neurons [37].The weight adjustment is a function of learning rate (η), output of unit i (oi), and is expressed as: where δj is the error term back propagated.It varies depending on the position of the neuron.If the neuron is in the output layer, then the delta error term is expressed as: If the neuron is positioned in the hidden layer, then the error term is expressed as: where k is the index of neurons in the layer m + 2.
The learning rate η affects directly the oscillations associated with learning.Smaller the learning rate, smaller will be the oscillations.But, to get accurate results we need a higher learning rate.To achieve this without an increase in oscillations, a momentum term can be included in the weight function.With this we can reduce the oscillations along with an increase in learning rate.The momentum term is: where n is the iteration number and α is a positive constant [38].
A least-mean-square algorithm is used is used to train the network.Here, Levenberg-Marquardt Algorithm (LMA) is used as it is very subtle and fast to use in least square curve fitting.LMA provides a numerical solution to minimize an error function, which is the average of the squared difference between the calculated value and the actual value.The error function can be expressed as: where dsk = kth element of the sth desired pattern vector.LMA minimizes this error function and adjust the weights accordingly to get the desired output.In this way, neural networks are optimized to get the most accurate results.These neural networks have a specific number of neurons associated with them, which are the optimum number of neurons for a specific set of input data.Here, 7 different sets of input data are used.All these sets are optimized by a different number of neurons, and thus, by a different neural network.
ANNs have been used in a broad range of applications including: pattern classification, function approximation, optimization, prediction and automatic control and many others, ANNs are also used for meteorological purpose and this paper uses it for prediction of GSR values based on different sets of measured multiple data.

RESULTS AND DISCUSSION
The neural networks were created with MATLAB 2010 and the results obtained were stored.The radiation results were obtained in MJ/m 2 day.RMSE values obtained varied from 0.239 to 0.731 and had the same unit as radiation.Value of 0.239 for combination 5 came to be exceptionally low while other values ranged from 0.452 to 0.731.RE obtained varied from 4.9% to 6.2%, again 4.9% being exceptionally low for combination 4 and other values ranging from 5.5% to 6.2%.R 2 varied from 0.85 to 0.92, with least for combination 6 and ranging from 0.89 to 0.92 for others.The number of neurons used for all the mentioned combinations were in a close range and varied from 21 to 25 neurons.The results obtained with the number of neurons used for each set of data are shown in Table 2.The radiation results obtained for all the 4 cities are shown in Figure 3 to 6. From the table it can be seen that the results obtained are highly close to the actual values.Combinations except 2 and 6 have R 2 values more than 0.9, which yields a high correlation between these combinations of input parameters and GSR values.Moreover, their RMSE and RE values are also small, making them more reliable in terms of error.
From graphs we can see the value trend for each combination is generally in accordance to the actual value trend.For Bhavnagar, combination 7 gave the most accurate values while combination 2 gave the more sparsed values.In case of Jaipur, almost all the combinations gave equally accurate values except combination 2. In case of Jaisalmer, combinations 1 and 6 are less accurate than others, likewise combinations 1 and 3 in case of Nagpur.For some combinations at few instances, the graph can be seen to deviate more from the actual value graph.This can be due to the fact that input data values for these instances are more sparsed and do not follow the usual trend.This may happen because of instrumental or human error while measuring these values.On analyzing the graphs, combination 1 gave us the least accurate results which can be seen from its high RMSE value too.This can be because of the fact that altitude is not the most appropriate factor for GSR estimations among the parameters used.It is a For combinations 7, which includes all the geographical and meteorological parameters used in the paper, the results are well in accordance to the actual values.This is because of its high R 2 value and low RE and RMSE values.While for combination 6, which excludes relative humidity as an input parameter, the results differ.Its R 2 value is the lowest of them all.This can be accredited to the fact that relative humidity is an important factor for GSR estimation.Relative humidity gives us an estimate of the amount of water vapour present in the atmosphere.These water vapours along with other particles present in air are responsible for the reflection of radiation back.So, when relative humidity is considered it gives us a better estimate of the solar radiation that is being actually transmitted through the atmosphere.This kind of radiation is known as diffused solar radiation, which when combined with direct beam radiation gives us the total global solar radiation.The fact that relative humidity is a reliable input parameter can be corroborated by the results obtained from combination 5.It has the least RMSE value, low RE value and a high R 2 value.
Although, combinations 5 and 7 seem to be more reliable as they utilize all the important parameters, all other combinations are reliable and efficient too.Thus, when predicting GSR values using ANN, any combinations of input parameters mentioned can be used for prediction, according to the input parameters available to the user.

CONCLUSION
GSR value associated with a place is a very important aspect in terms of scientific research as well as engineering projects.Since, the value is not available for all the places, it has to be interpolated for known stations, using the known meteorological and geographical data.The interpolation model used here is ANN.Due to its capability of learning from the past, ANN have proved to be the most reliable method for GSR estimation.Here, a function was developed using input as various geographical and meteorological parameters for the places considered.In this paper, solar radiation values were predicted using various combinations of these parameters, along with relative humidity as one of the parameters.These values were then compared to their known original values to validate the efficiency of the model.The data considered were for past 9 years and 13 cities in India, out of which, data for 9 cities were taken for training the network and data for other 4 were taken for testing.
The results obtained were highly close to the original values.RMSE values obtained varied from 0.239 to 0.731, while RE obtained varied from 4.8% to 6.2% and R 2 varied from 0.86 to 0.93.Results close to the actual values obtained from combinations 5 and 7 prove the importance of relative humidity as an input parameter.This is because of the fact that relative humidity helps us to get a better estimation of diffused solar radiation.In conclusion, the combination of all these 7 parameters when used with ANN are bound to give us accurate results.Though, depending on the availability of the data, any given combinations of the parameters can be used for prediction using ANN.

Table 1 .
Combinations of parameters used

Table 2 .
Given combinations and their results than a meteorological one, hence, this combination consists only of geographical parameters.It means this combination does not involve any meteorological parameter to determine the actual effect of sun on the radiation value for that place.This develops a necessity to evaluate combinations of geographical parameters with meteorological parameters.Sunshine duration and maximum temperature are meteorological parameters and are more trustworthy options when combined with geographical parameters.It is proved with their better obtained results in combinations 2, 3 and 4 than combination 1.
Figure 6.GSR results for Nagpur