Oct 28, 2020

**Part 1 – What is predictive analytics?**

- Andrei Popescu, 4Cast Product Owner

Big Data, Machine Learning, Cloud Computing, Predictive Analytics. We hear “buzz words” like these more and more in our industry these days. To many, these words imply progress, a technical (or technological) advancement in the way we manage and operate our fields compared to current practices, which can drive efficiencies and ultimately increase profit margin. By and large, I think the general consensus is that we should be applying these concepts and techniques in our operations, but the field seems not only split on how to best do this, but also to some extent on what these things even mean for the oil and gas sector.

As we’ve recently released our new platform 4Cast, I’d like to take a deeper look specifically into predictive analytics, and how it can be applied to oil and gas data sets to help make more optimal development decisions in our fields. While I’ll focus on core concepts of predictive analytics, I’ll note that we will of course touch upon the other buzz words mentioned above as well. After all, **4Cast is a cloud-based platform designed specifically to streamline the application of statistical analysis and Machine Learning algorithms to oil and gas data**. I think I just hit most of the aforementioned buzz words in a single sentence 😊

So, what is predictive analytics? Broadly speaking, it’s a variety of statistical techniques that help us to analyze current and historical facts in order to make predictions about the future, or other otherwise unknown events. I got that from Wikipedia, but I can’t think of a more accurate way to put it. More concisely for our purposes: **we want to use our existing Well data to predict what will happen with our future Wells**. The main reason we would want to do this is to make better, more informed development decisions. If we have a reasonable and reliable expectation of the outcome of our actions, we can “simulate” multiple possible scenarios, and choose the scenario that results in the most desirable outcome from among the simulations.

*Fig. 1 - A developed asset with 90+ Wells drilled over the course of several years. Completion systems and design varies across all wells, as do the results. Wells are colored based on 12-month cumulative production.*

Let’s take for example a situation, as shown above, an asset with several years of existing development. This year, we have a certain fixed budget to drill and complete new wells, and we need to manage that budget in order to maximize production while minimizing cost - a situation many can relate to, I’m sure. In order to make the best use of our budget, we need to understand the effects that different parameters will have on our outcome. In this case, the parameters are the known characteristics of our reservoir (geological, petrophysical, etc.), Well design (drilling target, well spacing/stacking, etc.), and Completion design (proppant/fluid concentration, stage/cluster spacing, etc.), and the outcome is production.

A common approach to try and increase our understanding is to turn to “Frac Modelling”, or using defined physical and mathematical equations or relationships to define the size and shape of our fractures in the subsurface. Don’t get me wrong, I think this is a noble endeavor and certainly has merit as **one of many variables** involved in the overall picture. However, I believe that by focusing too much on this singular aspect, which is incredibly difficult to validate, we risk missing out on a large number of insights our data can provide us without knowing the exact size and shape of each individual fracture.

Consider that we have access to decades worth of data where various Well and Completion designs, which are known quantities, were applied. The production results of these different trials are also known with a high degree of certainty. Combine these knowns with additional geologic and geophysical data which can constrain reservoir quality and wellbore point of contact. What we have then is a large data set with many known, independent variables and a dependent variable – this is exactly what we need in order to put predictive analytics and the rest of the buzz words to practical use. Given data sets where we know the inputs (or many of them at least) and we know the results of those inputs, we can use well defined and rigorously tested statistical techniques to increase our understanding of the effects of each of our known variables on the outcome. Specifically, **we can train and refine data models using machine learning algorithms to be able to predict outcomes, based on the known input variables**.

*Fig. 2 - A general outline for a predictive modeling workflow that can be applied to any dataset where we have known independent variables alongside known results. We’ll go into more detail for each of these steps in the subsequent articles using the example asset shown in Fig. 1.*

Using the example situation from above of wanting to maximize production while minimizing cost, let’s first define our strategy for doing this in concrete steps:

- Define the target variable that we want to predict which can help inform our strategic decisions - in this case, our target will be 12-month Production
- Compile and visualize the available data in a consistent format, and one that can be directly compared to our target - since 12-month Production is measured at the Well level, our data should be organized and reported at the Well level also
- Identify existing trends/patterns within our data, and define dependent relationships
- Select and preprocess input variables -
**these input variables should be quantities that can be known with a high degree of confidence prior to drilling new wells** - Build, test, and refine data models until we have one which can accurately predict historic results based on the defined input variables
- Simulate a large number of potential future development options, and use the data model to predict the results
- Identify the simulated option which is predicted to achieve the optimal result

In the subsequent articles in this series, I’m going to tackle each of the steps we just outlined using 4Cast, and a real data set. By doing so, we’ll take a deeper dive into what I believe are the key components of predictive analytics. Hopefully, this framework and example will help you to outline whatever questions are most important to your Team and apply an analytical approach to answering those questions.

Thanks for taking the time to read through my introductory thoughts on this topic, and hopefully I’ve piqued your interest to come back for the next edition. I would welcome anyone who has additional insights or would like to discuss anything I’ve mentioned here in greater detail to please chime in in the comments or reach out to us directly. Please be sure to follow us for the latest news on 4Cast, and to catch the follow-up articles in the coming weeks.