Se connecter
Date limite de participation :
4 octobre 2021

Data challenge Air Liquide - Total - Phase 2

Entry Level - Phase 2
July 5 - Octobre 4


In this second phase of the competition, you will now be asked to predict the wind power of six farms in several 48 hours periods.

Classement
1. (1) Yen SUN & Chahinez NAZEL Score 0,107868
2. (2) Simon GRAH & Paul BERHAUT Score 0,111729
3. (3) Norman DARIL & Medhat MAHANNI Score 0,112520
Ce challenge est terminé.

376

contributions

10

participants

terminé
terminé

All the rules and details about the Data Challenge can be downloaded here:
https://bit.ly/3ygG9C4

Wind power is a fast-growing source of renewable energy in Europe. However, its rapid development causes many challenges for the electricity market and especially for power system operators who must deal with uncertainty in wind power generation when making scheduling and dispatch decisions. In this context, short-term prediction (up to 48 hours) is a key tool to navigate this variability and to operate power systems more efficiently with large wind power penetration.

In this second phase of the competition, you will now be asked to predict the wind power of six farms in several 48 hours periods.

As for the first phase you do not have access to the position of the farms, and the data is available for periods ranging from the 1st hour of 2009/07/01 to the 24th hour of 2012/07/12.

The test phase is still composed of 157 periods of 48 hours between 2011/01/01 and 2012/07/02 where you will have access to the forecasts of the wind characteristics, and you must predict the wind power for the six farms. Notice that each of these periods is separated from the adjacent ones by a 36-hour period.

The train phase is composed of two things:

  • wind forecast and wind power from 2009/07/01 to 2011/01/01, the initial train phase
  • wind forecast and wind power on 36 hours phases between each of the 157 test periods on which you can retrain you models

Compared to the first phase, you will have more forecasts accessible: in the first phase of the competition, you had access to only one forecast at each point in time, now you will have access to 4. Indeed, we provide forecasts 48h ahead updated every 12 hours.

Another novelty compared to phase 1 is that we now do not allow you to use the future to predict the past. Typically, in the notebook provided by us in phase 1, you can notice that we were already transgressing that rule as the linear model was learnt on the whole train dataset, including points in the future compared to some of the points of the test dataset. That was perfectly allowed for phase 1, but it is not anymore for phase 2. This point will be checked when we audit the solutions at the end of phase 2.

To recap, compared to phase 1, there are three novelties: 6 farms instead of one; 4 forecasts instead of 1 for each farm; and lastly a stricter rule in terms of what is allowed or not to train on (no future to predict past).

Please note that to be admissible, your algorithm must be auditable and you will be required to share it with the organization team at the end of phase 2. AutoML is not allowed for this Data Challenge.

Following the “Manifeste IA” signature in 2019,  Air Liquide and Total initiated discussions on industrial Data Science and Artificial Intelligence.

Among the topics of common interest, a firm belief that data is key to solve societal challenges especially in energy transition where Air Liquide and Total are actively involved. 

This belief is further reinforced by the fact that data is now everywhere, internally or externally and now more than ever within reach of increasing numbers of users, whether data experts and practitioners.

It quickly appeared that a joined organization of a data challenge would be a great opportunity to continue promoting data science and sharing best practices in both companies while tackling an energy transition topic such as renewable energies. The topic of wind power has been chosen for this challenge which starts May 10, 2021.

The challenge will be split into 2-phases:

  • Phase 1: Individual participation opened to all employees of both companies to practice further the data skills and connect to communities of like-minded peers with an entry-level challenge
  • Phase 2: Creation of mixed teams of two people each, one from Air Liquide and one from Total, to cross-fertilize the diversity of ideas, methodologies and techniques with a more advanced-level challenge

Wind power is a fast-growing source of renewable energy. However, its rapid development causes many challenges for the electricity market and especially for power system operators who must deal with uncertainty in wind power generation when making scheduling and dispatch decisions. In this context, short-term prediction is key to operate power systems more efficiently with larger wind power penetration.

The training set contains 19,033 lines with the power produced by the 6 farms. The test set contains 7,536 lines where you need to predict the powers produced by the 6 farms. Alongside these train and test sets, you have the wind forecasts in a separate file. You need to combine these files altogether in order to be able to predict the powers for the 6 farms on the test set.

“wp1” to “wp6” are the target variables. They are the normalized power generated by the different farms. Like in phase 1, “u”, “v”, “ws” and “wd” are forecasted characteristics of the wind around the farm:

  • “u” is the zonal wind component of the wind vector,
  • “v” is the meridional wind component of the wind vector,
  • “ws” is the wind speed, the L2 norm of the wind vector,
  • “wd” is the wind direction, the angle between the wind vector and the north.

Note that the components “u” and “v” are expressed with the wind vector azimuth convention. You can find more information here.

The performance of your model is evaluated using the Mean Absolute Error (MAE) between your prediction \(\hat{y_{i}}\) and the ground truth \(y_{i}\) over the \(N\) samples of the test set:$$ MAE = (\frac{1}{N})\sum_{i=1}^{N}\left | y_{i} - \hat{y_{i}} \right | $$ Where \(N\) is now the number of lines (dates and hours) multiplied by the number of farms.

The lower the MAE, the better your model is on the test set.

1. (1) Yen SUN & Chahinez NAZEL 30 contributions 04/10/21 22:36 Score 0,107868
2. (2) Simon GRAH & Paul BERHAUT 16 contributions 04/10/21 22:21 Score 0,111729
3. (3) Norman DARIL & Medhat MAHANNI 58 contributions 02/10/21 21:41 Score 0,112520
4. (4) Vincent HAGUET & Lucy CHEN 66 contributions 04/10/21 18:36 Score 0,113060
5. (7) Stefano FRAMBATI & Magdalena KOCIOLEK 8 contributions 02/10/21 02:10 Score 0,115727
6. (5) Paul MARTIN & Sergei PARSHIN 78 contributions 04/10/21 17:47 Score 0,115970
7. (6) Rami NAMMOUR & Xiang KAN 23 contributions 02/10/21 07:04 Score 0,118207
8. (8) Nommie KASHANI & Lok Yin WONG 21 contributions 03/10/21 15:04 Score 0,119774
9. (9) Vincent LEVORATO & Vianney PITTAVINO 11 contributions 17/09/21 00:16 Score 0,120032
10. (10) Morgane VIVES & Gerald CASTEROU 47 contributions 23/09/21 12:20 Score 0,125162
Discussions
loading... Chargement...