Se connecter
Date limite de participation :
8 août 2023

Ecole d'été : prévision de production d’électricité de centrales à gaz à cycle combiné

Ce challenge consiste à prévoir la quantité d’énergie produite par une centrale de production sur plusieurs périodes de 48 heures

Classement
1. (2) HOLLO Fructueux Score 3,302033
2. (3) Yanel Score 3,408974
3. (7) Ramadone SANNY ABOKI Score 3,465422
Ce challenge est terminé.

1488

contributions

133

participants

terminé
terminé
Welcome to the Data Challenge EEIA!

 

This article will allow you to take your first steps in the world of data science and give you all the tools to dive into the electricity production forecasts for Combined Cycle Gas Turbine data challenge. The objective is to predict the production in MegaWatt.

We will give you the technical information allowing you to install your development environment. We will describe the main steps for a data scientist to approach a problem. Several links will be at your disposal to deepen the concepts related to this article. Moreover, a notebook allows you to develop your first model, it's up to you to improve it!

I. Setting up the environment

To get starter with the data challenge, you will have to choose a coding language. In data science, the most know are Python and R. You can use the one you prefer, but if you are new in this domain, we advise you to use Python.

To start coding, you will have to download a development environment. This will allow to create scripts, use packages and build your machine learning models. Among the environment, we can cite Jupyter Notebook, Pycharm, and  Spyder for Python and Rstudio for R. To have access to all of these, I advise you to download Anaconda. This famous open-source Data Science platform allows you to simplify the use of environment and packages.

Summary:

And also, some reference:

II. Exploratory Data Analysis

Now that you have set up your environment, you can use packages and write your scripts, and you are ready to (really) get started.

Different steps are necessary to get familiar with the dataset and provide successful submissions in a data challenge. First, you need to read and analyze the data: in most of the data challenges, the data set are not clean and can contain a lot of corrupted values or can have missing rows, columns, or samples. This first step is very important as it could raise issues when you apply a machine learning model, and it will greatly impact your results.

To do these analyses, you can either checks statistics describing each variable or you can also visualize your data with some libraries. This will give you visual insight that can be easily interpreted and understood. Visualization is also very important for a data scientist as it allows him to communicate his insights and results.

Data processing and transformation is often considered the most important part of a data challenge. Once you have well analyzed the dataset, you will have to take action: you can, for instance, correct the corrupted data, transform a column to add information and create new columns that will be useful in your study. This step is also very important because it will allow you to transform raw into informative and clean data, improving your submissions and climbing on the leaderboard.

III. Modeling
A. Splitting the dataset

Now that we are done manipulating our data, we must build a strategy to learn the relations between the variables and particularly how the wind power evolute with regards to its past values and the other variables.

Separating data into training and testing sets is important in evaluating machine learning models.


Why Splitting Training Dataset?

We split training data into training and a testing dataset, which gives us the opportunity to fairly evaluate our model’s performance without submitting results.

Typically, around 80% of the full training data is allocated to the training set and 20% is allocated to the testing dataset that is hold out for evaluation. In some cases, where we deal with models that have hyperparameters, we need to split the dataset into three part: 60% of training, 20% of validation and 20% of testing. Indeed, the training part allows our model to learn the optimal combinations of variables which fit at best the data, and we use the validation set to evaluate the performance of our model and tune hyperparameters during training. The idea behind splitting the training into training and validation is to avoid overfitting (meaning that our model becomes very good on training data because the model learns to fit exactly the points but do poorly on other data , we say that it didn’t generalize well).

Finally, we use the test set to have a fair evaluation of our model that is independent of the training process.

To sum up, we have a first part named training set, which aims to learn the dynamic of the time series. A second part is used to evaluate our model, avoid overfitting and fine-tune the hyperparameters. Once we have used these two sets, we need another independent one that will give us the model's accuracy. Cross validation is an approach that allow us to use our dataset better and better evaluate the model.

For more detail, here is a link.

B. Modeling

Now we can move on to the modeling of our machine learning model. There is a wide range variety of models that can be applied to tackle this data challenge. In this article, we will give you some idea about what can be done.

To deal with time series data, we can separate two main approaches:

  • The first approaches tackle the problem without considering the time dependence. In fact, each time step will be predicted based on its characteristics ie the values that takes the different variables that we have. This is the approach developed in the notebook. 

Any machine learning model can be used, from linear regression to deep learning. The advantage of this approach is that we can take advantage of the best models in machine learning. The limit of this approach is that it doesn’t consider the temporal aspect like trend and seasonality.

  • The second approach are time series models, which try to learn the time series as a whole and considers the temporal aspect. Many models exist, we can cite Autoregressive models like Arima or Exponential smoothing. Some packages are available to deal with time series. Among them, we can cite ProphetHere is great support for getting familiar with time-series data: Forecasting: Principles and Practice.

It is possible to combine these two approaches to get the advantage of both worlds. How?

Ensemble models are very popular in Kaggle competitions, and its idea is to combine different models and make a stronger one, “a supermodel,” to better predict.

C. Quantifying the results

Quantifying the data is very important to fully understand if the model is performing as expected. There are multiple ways to analyze the data by quantifying the results. In the following, we give you some metrics:

MAE - Mean Absolute Error: Mean absolute error is a measure of errors between paired observations expressing the same phenomenon

RMSE - Root Mean Square Error: Root Mean Square Error (RMSE) is the standard deviation of the residuals (prediction errors).

R2: R2 is the proportion of the variance in the dependent variable that is predictable from the independent variable.

Dickey-Fuller Test: This test is used to analyze if the data set is stationary or non-stationary (If there are a trend or seasonal effects)

ACF/PACF - Autocorrelation Function / Partial Autocorrelation Function: Autocovariance is a function that gives the covariance of the process with itself at pairs of time points. Partial autocorrelations measure the linear dependence of one variable after removing the effect of other variable(s) that affect both variables.


With these quantitative analyses, it is also very important to visualize the results to understand how the trained data performed on the test data. By plotting your results, you will be able to compare and see where you can improve your models.  

 

Once we have found the perfect model for our data set using the methods defined, we begin forecasting by fitting new dates and features that were not used or seen in the training dataset.

 

IV. Submission

Once you have made your prediction, it is time to submit your results! For this, a part of the test on the platform is reserved to evaluate your model and to give you an indication, with the help of a metric, of the quality of your model and predictions.

 

Another part is private and is reserved for the final evaluation when the data challenge is over. The objective is to avoid participant to overfit on the public test set.

 

 

 

Real life applications: In business projects, we combine the forecasted data with training (actual) data to visualize the performance. Data visualization is very important in business as it is self-explanatory and can be presented and shared easily with end-users. Tools such as PowerBI or Tableau are very powerful tools that can properly help us visualize and help us analyze our forecasting period. 

Moreover, from a practical point of view, the prediction of electrical energy will allow to better respond to the need and to integrate wind energy in the energy mix.

 

Data challenges are an opportunity to move from theory to practice and to learn a lot. Please take advantage of it, and don't forget to be creative!

Le règlement du challenge peut être consulté ici :
https://bit.ly/3DAprBH

Fournisseur d’énergie depuis 2003, TotalEnergies est depuis 2016 également producteur d’électricité, et donc présent sur toute la chaîne de valeur de l’électricité. Afin de répondre à la demande croissante en électricité, nous poursuivons notre expansion sur le marché des énergies renouvelables en vue d’être dans le top 5 mondial d’ici 2030 avec 100GW de capacité de production. Mais pour que les énergies intermittentes puissent pleinement jouer leur rôle, nous développons également nos capacités de stockage de l’électricité à grande échelle et complétons notre capacité de génération électrique avec du gaz, la moins émissive des énergies fossiles.

Ces dernières années, nous avons fait l’acquisition de centrales à gaz (CCGT) afin d’augmenter notre capacité de génération électrique. Nous disposons désormais en Europe de 9 tranches de centrales électriques à cycle combiné gaz reparties sur 7 sites : 5 en France, un en Belgique et un en Espagne pour une capacité de production de 4 GW, avec l’objectif d’atteindre 7-10 GW en 2030.


source https://www.totalenergies.fr/nous-decouvrir

Mais en fait, est-ce que vous savez ce qu’est une CCGT ?

Les centrales à gaz à cycle combiné (CCGT = Combined Cycle Gas Turbine) consistent d’une turbine à gaz entrainée par la combustion de l’air comprimé et du gaz naturel, et d’un cycle vapeur qui met à profit l’énergie résiduelle des gaz chauds générées pour faire bouillir le fluide d’un second cycle thermodynamique. La vapeur ainsi obtenue entraine une seconde turbine génératrice d’électricité. Le rendement est fortement amélioré (jusqu’à 60% contre 35% pour une turbine à gaz en cycle ouvert) et les émissions polluantes s’en trouvent fortement réduites (jusqu’à 50% d’émissions polluantes en moins pour la même quantité d’électricité fournie).

Les centrales sont une composante essentielle du mix énergétique. Ces moyens de production participent pleinement à assurer l’approvisionnement du parc électrique français en offrant une réponse flexible à la variabilité de la demande.

Et le Hackathon dans tout ça…

Objet industriel de production d’électricité, les CCGT sont équipées de capteurs qui permettent de suivre et de prédire la production d’électricité. Vous devrez pour ce hackathon utiliser l’historique de production mis à votre disposition pour entrainer vos modèles et fournir un CSV* avec la prédiction de production en MegaWatt des données de test.

* CSV avec " ;" comme séparateur et "." pour les décimales.

Train et test : Lien vers les fichiers

Pour entrainer vos modèles de prédiction de production (MW), vous aurez ainsi accès aux données des capteurs sur 80% d’une année de production d’une centrale CCGT avec un pas d’une minute.

Vous soumettrez ainsi vos résultats avec la prédiction de production en MegaWatt sur les 20%.

Amb temp (°C)Température ambiante
Comp inlet temp (°C)Température entrée compresseur
amb pressurePression ambiante
HR %%Humidité relative ambiante
C/HRapport Carbonne/hydrogène du Gaz Nat
Network Frequency (Hz)Fréquence du réseau électrique en Hz
Lower Heating Value (Wh/Nm3)  le pouvoir calorifique inférieur du Gaz Nat
EOH (h)Heures d'Opérations Equivalentes
DP filtrePerte de charge au niveau des filtres d'air entrée turbine à gaz
CTRL anti givrageControl de la vanne d'ouverture de l'anti-givre entrée turbine gaz
IGV %% d'ouverture de la valve IGV (Inlet Guide Vanes) afin de controller la charge de la CCGT
Net Power (MW)Production d'électricité NETTE générée par la CCGT

Le critère d'évaluation de votre modèle sera le MAE (Mean Absolute Error) qui est la moyenne de la valeur absolue de l'écart entre votre prediction \(\hat{y_{i}}\) and et la vraie valeur de la production \(y_{i}\) sur un échantillon de \(N\) observations.

$$ MAE = (\frac{1}{N})\sum_{i=1}^{N}\left | y_{i} - \hat{y_{i}} \right | $$ Plus le MAE est petit, meilleur est le modèle au regard des données.

1. (2) HOLLO Fructueux 16 contributions 08/08/23 08:39 Score 3,302033
2. (3) Yanel 3 contributions 08/08/23 04:59 Score 3,408974
3. (7) Ramadone SANNY ABOKI 40 contributions 06/08/23 23:43 Score 3,465422
4. (12) Moussinou MAMA 6 contributions 08/08/23 17:40 Score 3,501170
5. (1) Joaïda Précieuse AKOTENOU 23 contributions 08/08/23 22:28 Score 3,505567
6. (18) Maths_AI 47 contributions 04/08/23 19:47 Score 3,570027
7. (21) Fidèle M'PO 51 contributions 07/08/23 00:07 Score 3,600026
8. (5) Constantin DJOSSA 53 contributions 08/08/23 18:38 Score 3,672977
9. (9) MEDO Aures 33 contributions 08/08/23 19:15 Score 3,680234
10. (15) Jules KOUALODE 12 contributions 08/08/23 13:24 Score 3,698929
11. (22) IMELDA AGOSSOU 9 contributions 07/08/23 17:35 Score 3,706322
12. (6) Gg Jjj 39 contributions 03/08/23 12:10 Score 3,760079
13. (26) ML IA 1 contribution 08/08/23 09:39 Score 3,764375
14. (11) Kenneth HOUNDEGLA 30 contributions 08/08/23 22:43 Score 3,862929
15. (27) Helkias AKPOVI 24 contributions 04/08/23 15:43 Score 3,867217
16. (10) Théoctiste Romaric Balthasar DJOSSOU 23 contributions 07/08/23 23:05 Score 3,890329
17. (13) OLAOMO Emmanuel 20 contributions 08/08/23 07:48 Score 3,923515
18. (51) Jordan HOUANSOU-BLE 8 contributions 08/08/23 21:05 Score 3,965777
19. (17) AMOUZOU ABRAHAM BERLINO 6 contributions 08/08/23 23:18 Score 3,980133
20. (30) LAURA META 10 contributions 07/08/23 01:45 Score 3,984308
21. (8) Sourou Emmanuel 32 contributions 08/08/23 23:51 Score 4,009852
22. (16) Wilfrid houndenou 11 contributions 08/08/23 23:19 Score 4,025463
23. (40) Bio Mourou OROU BOUYAGUI 8 contributions 08/08/23 22:16 Score 4,032253
24. (19) Sèmako Roland HONFO 41 contributions 08/08/23 22:53 Score 4,049061
25. (4) Judes 13 contributions 07/08/23 23:40 Score 4,050874
26. (41) Elvis LOKOSSOU 11 contributions 08/08/23 21:54 Score 4,054846
27. (35) EEIA si je savais 13 contributions 07/08/23 21:00 Score 4,167633
28. (58) Abdourahamane Idé Salifou 28 contributions 04/08/23 22:32 Score 4,188186
29. (70) STHYVE JUNIOR TATHO DJEANOU 16 contributions 03/08/23 11:49 Score 4,220902
30. (38) Bienvenu DABOUGOU 30 contributions 08/08/23 05:04 Score 4,222705
31. (14) Fresnel Feischola Alapini 26 contributions 08/08/23 22:20 Score 4,227300
32. (28) Junior Lissanon 21 contributions 07/08/23 18:34 Score 4,245815
33. (44) Marciano DJINATO 18 contributions 08/08/23 22:33 Score 4,272722
34. (34) CharleSTon 24 contributions 07/08/23 18:12 Score 4,283672
35. (33) Kévin KPAKPO 3 contributions 27/07/23 19:53 Score 4,291594
36. (42) @ A 36 contributions 02/08/23 17:21 Score 4,311201
37. (25) Michel GODONOU 21 contributions 07/08/23 03:41 Score 4,313504
38. (43) Fresnel AKAN 11 contributions 07/08/23 18:02 Score 4,327563
39. (49) Fédel FOLLY 16 contributions 30/07/23 20:43 Score 4,356150
40. (32) Mireille Gloria Founmilayo ODOUNFA 24 contributions 30/07/23 21:05 Score 4,360482
41. (39) Ousséni BIO KOUMAZAN 33 contributions 03/08/23 15:15 Score 4,364735
42. (48) Harlette Denebeye 42 contributions 08/08/23 23:50 Score 4,372961
43. (24) Roger AGONSANOU 32 contributions 02/08/23 00:05 Score 4,375167
44. (37) Olympe ATCHATIN 47 contributions 04/08/23 19:02 Score 4,388786
45. (47) Ymar 18 contributions 04/08/23 18:14 Score 4,388999
46. (20) Gilchrist TOCHOEDO 19 contributions 08/08/23 22:52 Score 4,405297
47. (54) ALI Assanatou 8 contributions 03/08/23 09:33 Score 4,415911
48. (45) Alan OROU N'GOBI 12 contributions 31/07/23 07:50 Score 4,422967
49. (31) ESSE ANICET AMOUSSOU 46 contributions 02/08/23 20:59 Score 4,426443
50. (53) Farid ABOUBAKARI 5 contributions 06/08/23 23:03 Score 4,446040
51. (23) Romaric Assogba 17 contributions 29/07/23 09:37 Score 4,455515
52. (62) DJONNONHS Cornélia Adéyèmi 2 contributions 07/08/23 23:57 Score 4,462001
53. (36) Zo-Hary RALAMBOHARISOA 13 contributions 07/08/23 23:11 Score 4,471754
54. (50) Sophie AHOLOU 1 contribution 04/08/23 16:17 Score 4,472881
55. (56) Pulcherie MANLEY 10 contributions 08/08/23 08:27 Score 4,474147
56. (69) Loto 6 contributions 05/08/23 02:11 Score 4,482762
57. (59) Fabiola SABOUTEY TETTEY 5 contributions 06/08/23 01:06 Score 4,512288
58. (60) Alex AMIGBATIN 18 contributions 29/07/23 12:15 Score 4,514089
59. (61) MASJT 3 contributions 04/08/23 17:59 Score 4,525432
60. (46) Friedrich WEKENON TOKPONTO 20 contributions 30/07/23 13:58 Score 4,550665
61. (52) HigH DOVE 9 contributions 02/08/23 23:08 Score 4,551433
62. (57) Bonaventure AGONHOUN 11 contributions 08/08/23 20:24 Score 4,563384
63. (67) Félix Akovognon KPIKPONSOU 2 contributions 08/08/23 22:59 Score 4,566234
64. (63) Johannès HOUNSINOU 15 contributions 04/08/23 20:27 Score 4,592808
65. (75) Arnella Agbodjalou 4 contributions 04/08/23 08:06 Score 4,607541
66. (74) Jr 4 contributions 08/08/23 22:58 Score 4,618151
67. (65) Mashkourath TCHANI 11 contributions 02/08/23 21:15 Score 4,632317
68. (68) Jerry 4 contributions 08/08/23 14:57 Score 4,682002
69. (64) Anonyme [XXX] 12 contributions 05/08/23 11:07 Score 4,715384
70. (72) Sergio Bossou 3 contributions 28/07/23 14:29 Score 4,769458
71. (55) Perseverance HOUESSOU 12 contributions 05/08/23 18:17 Score 4,775856
72. (29) Judicaël WAOUNWA 9 contributions 07/08/23 22:36 Score 4,802539
73. (66) Marius CODJO BLIGUI 2 contributions 05/08/23 15:00 Score 4,893729
74. (80) Julien Yendoupabe KOLANI 10 contributions 05/08/23 06:34 Score 4,947626
75. (89) Sogoma 8 contributions 05/08/23 23:00 Score 5,112745
76. (71) LONTCHEDJI Roméo 6 contributions 29/07/23 18:59 Score 5,175952
77. (73) Gio 10 contributions 04/08/23 21:27 Score 5,367404
78. (76) Olivier DJOGBENOU 5 contributions 07/08/23 16:28 Score 5,536706
79. (77) Adéchina A Hospice ATCHADE 8 contributions 29/07/23 14:21 Score 5,825541
80. (86) Ulysse LARY 8 contributions 06/08/23 13:40 Score 5,866691
81. (81) Mannondé 2 contributions 30/07/23 19:04 Score 6,126026
82. (83) Fructueux Arnaud ASSOGBA 1 contribution 07/08/23 16:46 Score 6,211219
83. (102) Martin AVAHOUNLIN 1 contribution 03/08/23 15:52 Score 6,561718
84. (78) Raissa Kpotchai 4 contributions 02/08/23 18:21 Score 6,876218
85. (79) Aquilas AKPAKI 11 contributions 29/07/23 14:16 Score 6,902830
86. (91) BORNA Yannis 2 contributions 08/08/23 22:34 Score 6,923447
87. (85) NSIMBA MATONDO Lewi''s Ravel 2 contributions 07/08/23 07:04 Score 6,936778
88. (92) Eyitayo Hodonou 1 contribution 08/08/23 09:22 Score 6,962482
89. (82) Spéro NOUDOHOUENOU 10 contributions 05/08/23 09:05 Score 6,967258
90. (90) ZANTOU Fèmi Hilary 4 contributions 03/08/23 19:11 Score 6,980627
91. (95) Pernelle NOUBADAN 2 contributions 09/08/23 00:18 Score 7,024191
92. (99) Elysée MFISUMUKIZA 4 contributions 02/08/23 18:34 Score 7,053681
93. (93) Tagnon ZANNOU 1 contribution 02/08/23 19:16 Score 7,106800
94. (96) Margie-Morgane LWAMUGUMA 1 contribution 04/08/23 02:20 Score 7,145845
95. (94) ZANNOU Boris 1 contribution 27/07/23 18:34 Score 7,176128
96. (87) Rudolf Worou 2 contributions 19/07/23 12:31 Score 7,196310
97. (84) Junior 2 contributions 06/08/23 23:33 Score 7,217862
98. (97) Jean-Marie NDAYISABA 1 contribution 08/08/23 06:41 Score 7,252632
99. (88) AROUKOUN Amandine Rose Kpèdetin 3 contributions 08/08/23 20:02 Score 7,367678
100. (98) Gutz... 1 contribution 08/08/23 19:12 Score 7,450895
101. (100) Jodick Ndayisenga 2 contributions 03/08/23 21:08 Score 7,730203
102. (101) Era 2 contributions 08/08/23 14:56 Score 8,169841
103. (103) lla 9 contributions 01/08/23 16:51 Score 10,224039
104. (105) Anonyme 1 contribution 08/08/23 15:49 Score 13,919984
105. (104) Djes-Fresy BILENGA MOUKODOUMA 2 contributions 01/08/23 16:47 Score 17,864924
Discussions
loading... Chargement...