Machine learning is undoubtedly a fascinating discussion, and it is hard to believe that there are numerous instances we can use machine learning for and achieve great results. Today, we’re focussing on whether it is possible to predict the winning team in a soccer match using machine learning or not. If yes, what’s the method, and what were our results?

Soccer is the most popular sport on the planet. People play, watch, and bet on soccer. When we think about betting on soccer, it is clear that it is unpredictable, and it is not backed by substantial research to prove this. For example, there was an extremely unexpected champion in the 2015/2016 season of the Premier League, Leicester City. Leicester City’s chances of winning at the start of the campaign were 1 in 5000 [1].

Our research’s primary goal was to design a supervised ML algorithm that can predict soccer match results based on the statistics of the games. It will also be possible to determine the difficulty of prediction, and those with machine learning certification can better understand the intricacies and attain closer results.

Problem Statement

If you’re planning to use machine learning to predict soccer matches, then you should know what your project’s goal should be:

First is designing a web scraping robot that picks all the details of the matches
The next step is to automate the procedure of web scraping for all matches of the season
The following step involves creating a machine learning supervised model that can predict the outcome of matches
Finally, we assess the model

ETL and Data Exploration

The very first step is Web Scraping.

Before analyzing the information gathered, it is essential to understand how it was gathered. Therefore, this portion will be accessed by the web-scrape robot developed to analyze the database. Once the program is running to pull data from matches, it is time to develop a new code. This new code will take all season-long match URLs to the automated robot to accomplish this task.

Before the data was clear and ready to analyze, we took the following steps:

Select columns: Select columns that didn’t have many zero values.
Issues: The data collection didn’t go without a hitch. The reason was the total of the player team’s stats, and therefore it was necessary to remove these lines from the gathered data.
Group by match and team: Since the data gathered were from players playing in matches, it was essential to separate all the team players’ data into teams and matches.
Append the Result: The information gathered from the player’s table failed to provide the game’s outcome. Therefore, it was necessary to develop an additional program to add these results to the Data Frame.
Place: It was essential to develop an algorithm to identify if the team played at home or away games.

Data Pre-Processing

It’s impossible to collect the games‘ statistics before their outcome, and therefore it is necessary to develop certain variables available before the game. It’s crucial to develop a means for all variables to be considered to resolve this problem.

To put it simply, it means the data must include every previous game played for the match we’re trying to predict. Suppose a team is playing on the 20th of November; the code will then show an average of all the available variables for all games before this same game.

The code will create several variables to display the sequence of the points for each team’s last five, three, and previous matches. Three points for each win are summed up to the winning team. For each draw, it was one point, and for a loss, it was summed as zero points. This way, it’s possible to determine where the team is based on wins, draws, or defeats.

So, what were our results?

We did not have high hopes for an exact prediction. The results? Not consolidated, and data didn’t have any patterns, making it tough to reach a proper conclusion.

There were some complex problems, and we did apply some formulas to solve them. However, we didn’t approach the deepest point, as none of us had done dedicated AI ML courses. We believe that we could have dug deeper and gotten much closer results if we did.

What’s the Reflection?

Our experiment showed that soccer predictions are complicated to do, and it requires more variables to aid in predicting results. We can, however, learn through this experiment that a machine learning algorithm can “think” on which team is going to win and who to bet on and will be more accurate than those who don’t know about the game. You can expect an almost 17% advantage in the prediction compared to the chance of unpredictability.

We can expect Improvement.

For future endeavors, we’d suggest that you explore and identify additional factors that can be helpful, like injuries, besides having more information on the players in each team. In short, we’d say the more data you feed, the more variables you understand, the closer predictions you can get with the prediction of soccer matches with machine learning.

Perhaps data from soccer simulation games like Pro Evolution Soccer and FIFA [2] could assist in bringing more data within the Base. Another aspect that we could consider is to predict the number of goals in the future, forecasting the number of goals scored by each team. It is more difficult due to the predicted outcomes, and they must reconcile.

We wished we had somebody experienced in our experiment team. Yes, Improvement can be there; it just requires a deeper understanding.

Conclusion

Perhaps, this experiment will inspire you to create advanced and more complex models soon. If somehow we manage to get closer predictions, it would be a big boom for the betters. However, we’d say only 50% of predictions can be made using the existing data because you never know what a curveball life can throw at you!

Ref 1 – DN. PT

Ref 2 – Wiki