Using of an Artificial Neural Networks with Particle Swarm Optimization (ANN-PSO) Model in Prediction of Cost and Delay in Construction Projects

Article history: Received 27 May 2021 Accepted 14 August 2021 Construction project delay is a global phenomenon. The delay risk being regarded as a main challenge that is tackled via the firms of construction. It possessed an inverse effect upon the performance of the project resulting in cost overruns and productivity reduction. In Iraq, most construction projects surpassed their prearranged time and were delayed, resulting in a loss of productivity and income. The objective of this paper was to predict the cost and delay of construction projects to illustrate their risks effects by using of artificial neural networks with the particle swarm optimization method (ANNPSO). Thereby, risk factors were identified and analysed using Probability and Impact Analysis which were embraced as the model inputs. In comparison, the outputs for the models were represented by the ratio of the contractor's profit to project costs and the delay in construction projects. An ANN model was additionally evolved with a backpropagation (BP) optimization method to assess the exhibition of the ANN-PSO model. To evaluate the accuracy of the results of the ANN-PSO model, coefficient of correlation (R), determination coefficient (R2), and root mean squared error (RMSE) was utilized as performance evaluation of the models. The ANN-PSO model showed a significant performance in the delay prediction. The performance evaluation for the cost and delay prediction were (R=0.929, R2=0.863, RMSE=0.044), and (R=0.998, R2=0.996, RMSE=0.094), respectively. The model of ANN-PSO has a virtuous performance in the delay prediction better than the cost. However, the ANN-BP model showed better performance than ANN-PSO in term of cost prediction.


Introduction
The accurate estimation of construction costs in a construction project is a critical factor in the project's success. The estimation is done with minimum project information, which is helpful in the preliminary design stage [1]. It is more helpful for project managers to finish the work at a time and control the project is more effective [1]. Bakhary et al., 2015 [2] identified factors that contribute to cost overrun and potential measures to overcome the problem with the focus given to construction projects. Questionnaires were distributed to 30 * Corresponding author. E-mail address: mh79180@gmail.com DOI: 10.24237/djes.2021.14307 respondents from construction firms. Descriptive statistics and ranking analysis were used in data analysis. The result showed that the most severe factor contributed to cost overrun was the inaccurate or poor estimation of the original cost. The essential method to control construction costs was proper project costing and financing. Ali et al, 2016 [3] identified the risk and cost management in the construction projects' variation management. A questionnaire as an instrument was utilized for collecting the data. Arbitrary sampling methods were employed for collecting the information, and (105) questionnaires were spread, and (90) were reverted. The correlation analysis showed an important relation among risk, cost, and variation management.
Alashwal and Chew, 2017 [4] distributed a questionnaire for collecting information from (83) administration agencies, consultant companies, and contractor for determining the use of simulation methods for the cost assessment and governing and for assessing their effect upon the cost-effectiveness of the project. The results manifested that the understanding of respondents and the use of the methods of cost simulation in construction manufacturing was slight.
Lee et al. 2019 [5] proposed a knowledgebased risk mapping tool for systematically estimating the risk-related parameters causing cost overrun in the global marketplaces.
Construction delays affected the time and cost of projects. A construction project is usually considered successful if it is completed on time, within its budget, and successfully achieves its quality targets [6]. Emam et al. 2015 [7] employed a wide variety of analytical methods to conclude the most precise statistical ranking of delay causes. A survey questionnaire was prepared and was subject to pilot interviews before issuing it to practitioners, including clients, consultants, and contractor organizations. Results revealed that the top five factors causing delay to large building projects are: slow decision-making; discrepancies between specifications and drawings, major changes in design during construction; delay in settlement of contractor claims; and unreasonable project time frames. Alotaibi et al., 2014 [8]p examined the critical factors contributing to the construction delays and identified potential contributions of project management tools and techniques to minimizing them. Al-Kharashi and Skitmore (2019) [9] studied the causes of delays in public sector construction projects. They classified the causes of delay into six main groups: client-related causes; Contractor-related causes; consultantrelated causes; Materialsrelated causes; Labour and equipment-related causes.
Artificial intelligence (AI) models have  demonstrated their capability for solving the  dynamic, uncertain, and intricate responsibilities [10]. Chou et al., 2013 [11] developed two ANN models to predict the lowest tender price of primary and secondary school buildings. The findings showed that the two ANN models effectively learned during the training stage and gained good generalization capabilities in the testing session with average accuracy percentages of 79.3% and 82.2%.
The Ministry of Municipal and Rural Affairs in Baghdad declared that most of the construction projects in Iraq surpassed their planned time and were delayed because the contractor cannot get concerned with the other projects. The efficient bidding system condition with the minimum tender price is lost; this is an important reason resulting in a deprived performance and delays in the public construction projects in Iraq [12].
Previous studies have focused on the causes of the cost's estimation [1][2][3][4][5] and construction delays [6][7][8][9]. However, previous studies did not predict the cost and delay using the Artificial neural network and particle swarm optimization. Thus, the objective of this study was to predict the cost and delay of construction projects by using the artificial neural networks with the particle swarm optimization method (ANN-PSO).

Methodology
The research methodology adopted in this study is summarized into the following steps: 1. Data collection: The required data on construction projects were collected from 47 construction projects from AL-ZAWRAA state company in Baghdad city. 2. Based on the collected data, two models were developed to predict the cost and delay in the construction projects using artifical neural network and particle swarm optimization. 3. Evaluating the performance of the two models adopted in this study was done using coefficient of correlation (R), determination coefficient (R2), and root mean squared error (RMSE).

The tools used for the model development
The tools adopted for developing the models for predicting the cost and delay in the construction projects are explained below:

Artificial Neural Networks (ANNs)
ANNs are insight devices enlivened by the natural neural organizations of people and creatures, which can advantageously learn designs and foresee the aftereffects of an issue in high-dimensional space [13]. They can plan a bunch of contributions to a bunch of yields in a boisterous and complex dataset. Multi-layer perceptron (M.L.P.) is a solid and direct class of feed-forward ANNs. An ordinary M.L.P. network consists of the input layer, one or few hidden layers, and the output layer [14]. The first layer takes the worth of information sources and sends them to the accessible neurons in the second layer. Inside every neuron, a weighted amount of data sources is determined, and this worth in addition to a worth of inclination is changed by an enactment work, as demonstrated in Figure 1. At last, the yield signal is moved to the neurons in the output layer. The mathematical process of ANN used in this study was formulated as below [15]: Where: Ij = activation level of unit j. Wij= the weight that binds between unit i and j. xi = the value of input of units. θj = bias for unit j. yj = value of output for unit j. f(Ij)= transfer function.
The most common transfer function used in neural networks is the hyperbolic tangent and logistic sigmoid. As the problem of prediction is nonlinear so that the hyperbolic tangent has been used. It might prompt more exact results [16]. This function was somewhere in the range of −1 and 1 and is characterized as follows: Neural networks were trained to show expectable performance. Training is the process of changing and adjusting the weight of networks. The main object of this process is producing the weights between neurons which establish the total smallest error [17]. Backpropagation (BP) is the most optimization function used to train the neural network [9]. The Levenberg-Marquardt algorithm (L.M.A.) is regularly the quickest BP method in order to train the network [11]; so that a multi-layer perceptron (M.L.P.) neural network with the hyperbolic tangent activation function training by (B.P.) and (L.M.A.) optimization algorithm was used in this study.

Particle Swarm Optimization (PSO)
Particle swarm optimization (PSO) is a developmental knowledge calculation that was motivated by the social conduct of bird rushing or fish tutoring. Kennedy and Eberhart right off the bat proposed the PSO procedure in 1997 [16]. This calculation profits by a high velocity union rate among satisfactorily in many designing issues [18]. In this technique, an expense work that ought to be limited or expanded is at first characterized. At that point, many particles are made and disseminated in the D dimensional space of the issue. Every molecule contains the factors of the issue so that the objective function can be determined for every molecule. At last, every molecule's speed and position until the calculation meets [19].
where the addendums i and k mean the molecule and the cycle number, separately. ρi = {ρi1, ρi2, ρij, : , ρiD} and Vi = {vi1, vi2, : , vij, : , viD} are the position and speed vectors, individually. The vectors Pk best,i = {pi1, pi2, : , pij, : , piD } and Gk best = {g1, g2, : , gD} are the awesome position of the ith molecule over its set of experiences up to cycle k, and the situation of the best molecule in the multitude in emphasis k, individually. I = 1, 2, 3, N is a counter to the quantity of particles, and D is the quantity of issue measurements or factors. Moreover, C1 is a psychological boundary showing the level of nearby inquiry, though C2 is a social boundary to mirror the worldwide pursuit level. Additionally, r1 and r2 are two free irregular numbers consistently appropriated somewhere in the range of 0 and 1, and w is the inertial weight used to save the past speed of the particles during the streamlining cycle. Δt is the time stretch in which the position and speed are refreshed; this boundary is normally viewed as equivalent to 1.

Models development
Models Development steps for predicting the cost and delay in the construction projects are listed accordingly.

Data and preparation
The data were used to make prediction models using (ANN-PSO) model. There were 47 construction projects gathered from AL-ZAWRAA state company in Baghdad city, a public sector company in which an engineering cadre is available for various engineering specialties such as civil, mechanics, and electricity. The required data related to construction projects were collected, and the period of its establishment ranges between (2010-2020). After that, this data was used to study the research's main problems: studying the cost of construction projects, which represented the percentage of the contractor's profit to the total cost of projects. In addition, 21 projects of 47 were considered to study the delay in the construction projects. Ten variables were selected to study cost and delay as inputs for the model. The output for the model was represented by the contractor's profit percentage concerning the total cost of the project concerning the cost and the delay period for the project about the project delay. As the prediction problem is nonlinear for this purpose, the input and output data are normalized in the interval between 0 and 1. To obtain a high accuracy of the model, the Interpolation Code was used. By trial and error, the researcher used 500 points to get better results. The top ten risk factors which has been concerned in this study can be summarized in table (1) and (2).

Performance evaluation
To evaluate the performance of the models, the coefficient of correlation (R), determination coefficient (R 2 ), and root mean squared error (RMSE) were utilized as performance evaluation of the models. These evaluation indicators can be utilized as below: Where: P: measured output. T: actual output. S: total number of samples (training or testing) It is essential to demonstrate that the percentage of training and testing for each model were 70% and 30%, respectively, and all the codes needed for building the models were developed in MATLAB program. The input and output data for ANN-BP and ANN-PSO models can be utilized in Tables 3 and 4.      To be continued

Model architecture
The difficult and essential process in building the neural model is the definition of the network structure. The network must be continuously trained to find the network's optimum performance to obtain the minimal training and test errors and top (R) of validation data by doing the best characteristics of the ANN model, such as transfer functions, number of neurons, and max iteration.
Several trials and errors were made. The researcher believed that the optimal network for cost &delay model with minimal validation and highest correlation coefficient consists of five hidden nodes with 1000 max iteration and when Tansig activation function was used for hidden and outputs layer with 70% and 30% training and testing ratio respectively. Tables 5, 6, 7, and 8 show the different architecture of the ANN models to get the optimal structure of the model.     It is demonstrated from the above Tables 5 to 8 that selecting of 5 neurons with Tansig activation function prompted better outcomes by having the highest value R and R 2 , and the least RMSE value. Hence, an ANN with a Tansig activation function, single hidden layer containing five neurons, was considered, as represented in Figure 2:

Development of models to predict cost and delay using ANN
ANN-BP model with the hyperbolic tangent activation function training by (BP) and (LMA) optimization algorithm was used as was mentioned above. It was observed that the optimal ANN model gave an acceptable performance was consisted of five neurons with a single hidden layer. Figure 4 showed the results of the training and testing phase of ANN-BP model of cost prediction. The performance indices of ANN-BP model in the training phase were 0.972, 0.946, 0.026 for R, R 2 , and RMSE consecutively, as indicated by Figure 4 (a). However, the performance indices of ANN-BP model in the testing phase were 0.948, 0.899, 0.028 for R, R 2 , and RMSE, respectively, as demonstrated by Figure 4 (b). ANN-BP model showed superior performance in cost prediction since R and R2 were closer to 1, and RMSE was lower than 1. Figure 5 shows the results of the ANN-BP model to predict the measured cost for each project in the testing phase.   Figure 6 shows the results of ANN-BP model to predict delay. The performance indices of ANN-BP model in the training phase were 0.995, 0.991, and 0.022 for R, R 2 , and RMSE, respectively, as demonstrated by Figure 6 (a). While, the performance indices of ANN-BP model testing phase were 0.986, 0.971, 0.024 for R, R 2 , RMSE, respectively, as shown in Figure 6 (b). ANN-BP model to predict delay showed better performance with higher values for R and R 2 and lower value of RMSE as compared with ANN-BP model to predict the cost. Figure 7 shows the results of the ANN-BP model to predict the measured delay for each project in the testing phase.

Development of models to predict cost and delay using ANN-PSO
The training and testing phase of ANN-PSO model showed acceptable results in predicting the cost as illustrated in Figure 8. The performance indices of ANN-PSO model in the training phase were 0.924, 0.855, 0.050 for R, R2, and RMSE, respectively. On the other hand, the performance indices' testing phase was 0.929, 0.863, and 0.044 for R, R2, and RMSE consecutively. The model exhibited a significantly high correlation since the values of R and R2 coefficients approached 1, and the RMSE value was practically low ranged from 0 to 0.05. Figures 8 (a) and (b) showed the training and testing phase of the ANN-PSO model. The predict and the measured cost for each project in the testing phase using the ANN-PSO model are illustrated in Figure 9.   Figure 10 shows the results of the ANN-BP model to predict the delay in scatter diagrams to demonstrate the relation between the measured and actual delay. The performance indices of ANN-PSO model in the training phase were 0.938, 0.879, and 0.091 for R, R 2 , and RMSE, respectively, as shown in Figure 10 (a). while the performance indices in the testing phase were 0.998, 0.996, and 0.094 for R, R 2 , and RMSE demonstrated in Figure 10 (b). It is demonstrated that ANN-PSO model in the delay prediction was better than the cost prediction based on the values of R and R 2 , and RMSE. Figure 11 shows results of the ANN-PSO model to predict the measured delay for each project in the testing phase.
To sum up, ANN-PSO model showed significant results. However, the ANN-BP model was better than ANN-PSO in the prediction of cost and delay. Similar observations were recalled by Tareq et al.2020 [15]. He proposed PSO to the estimation of construction costs and duration of construction projects. A series of 60 projects collected from constructed government projects were utilized to build the proposed models. Eight input parameters, such as volume of bricks, the volume of concrete, footing type, elevators number, total floors area, ground floor area, floors number, and security status, are used to build the proposed model. The results displayed that the PSO models can be an alternative approach to evaluating construction projects' cost and /or duration. The developed model provided high prediction accuracy 0.97 and 0.99, with a low mean. A comparison of the models' results indicated that predicting with PSO was importantly more precise.

Conclusions
Based on the investigations done in this study, the following conclusion are drawn: ➢ The current study examined the capability of ANN-BP and ANN-PSO models in the costdelay prediction of 47 construction projects. ➢ The developed ANN-BP model with the Levenberg-Marquardt algorithm (L.M.A.) had good accuracy in predicting cost and delay of construction projects. ANN-BP model in the delay prediction was better than in cost prediction for the construction projects. ➢ The ANN-PSO model with ten particles and five neurons showed significant cost and delay prediction performance. ➢ Both models ANN-BP and ANN-PSO, showed acceptable results in cost and delay prediction. However, the optimal model based on performance indices (R, R 2 , RMSE) in the cost prediction was the ANN-BP model. In contrast, the ANN-PSO model was the optimal model in delay prediction based on performance indices (R, R 2 ) and RMSE.