News

Applying Machine Learning Approach for Forecasting Production of Oil-bearing Intervals in Mature Reservoirs

Author information – Trung Doan1, Tung Phi2, Kien Tran1, Long Hoang1 Affiliation – 1i2G Solutions Inc, 2VietsovPetro

March 12, 2026

Applying Machine Learning Approach for Forecasting Production of Oil-bearing Intervals in Mature Reservoirs

Author information – Trung Doan¹, Tung Phi², Kien Tran¹, Long Hoang¹

Affiliation – ¹i2G Solutions Inc, ²VietsovPetro

Summary

In mature offshore fields in Vietnam, overlooked pay is often identified in low-resistivity, low-contrast sand reservoirs and in behind-casing intervals that were bypassed during initial development. Additional perforation campaigns targeting these zones have delivered millions of barrels of incremental oil. However, the effectiveness of candidate selection still needs to improve to maximize remaining resources while reducing the risk of perforating water-bearing intervals. Comingle production from multiple sand bodies with different reservoir pressure, different permability and litmited PLT data make conventional physics-based approaches, such as reservoir simulation, insufficiently agile and high uncertainty to support timely re-perforation decisions.

The motivation for adopting machine learning, specifically symbolic regression, stems from the need for a flexible, data-driven alternative capable of capturing non-linear relationships between static inputs (well log, mud log measurements) and dynamic responses (production behavior). Unlike black box models, symbolic regression produces an interpretable model that allows reservoir engineers to interpret, assess, and validate model outputs against established domain knowledge. This study presents a current symbolic regression–based framework for production allocation and outlines a future multi-method workflow aimed at identifying missing-pay intervals and quantifying water breakthrough risk under weakly supervised conditions.

Theory

Data Description:

The analytical framework is powered by a multi-modal dataset from over 90 offshore that demonstrates significant remaining production potential.

Static Well Log: open-hole logs, including Gamma Ray (GR), Deep Resistivity (LLD), and Sonic Travel Time (DT), Neutron Porosity (NPHI), Density (RHOB) captured at 0.1-meter intervals.
Dynamic input: Well maturity & space in platform and Historical monthly records of aggregate well-level oil and water production rates, with histories spanning from 12 to over 160 months per well.

Primary Goal: The objective is to utilize well logs and aggregate production data to identify unperforated (overlooked-pay) intervals and forecast their oil production potential. This is achieved by predicting production contributions across specific reservoir zones, even when supervision is restricted to the well-level.

The Core Challenge: Back-Allocation under Weak Supervision, a critical technical bottleneck is that production is only available at the well level due to the scarcity of zonal flow profiling (e.g., limited PLT data). This research addresses a weakly supervised production analytics problem where depth-resolved production allocation must be inferred indirectly from aggregate signals. In these commingled systems, multiple different interval-level allocations can technically match the same total well-level production, requiring a robust framework to learn latent interval contributions.

Figure 1. Description of Data input and Data output

Figure 2. Logplot of a well

Figure 3. Well-level Oil Rate and Well-level Water Cut of a Well

Method

Theoretical Foundation: Symbolic Regression via Genetic Programming (GP). To solve this, the study employs Symbolic Regression (SR) based on Genetic Programming, an evolutionary algorithm that automatically derives mathematical models to link petrophysical properties to historical production.

Hierarchical Tree Structures: GP represents symbolic expressions as trees where internal nodes are mathematical operators (arithmetic, power laws, exponentials, logarithms) and terminal nodes are well-log inputs and production time.
Interpretability: Unlike “black-box” models, GP evolves human-readable expressions, allowing domain experts to audit the underlying logic and ensure compatibility with reservoir physics.

Workflow

Algorithmic Training and Evolutionary Mechanics: The training process follows a structured evolutionary workflow to ensure generalizability across heterogeneous reservoir zones:

Population Evolution: An initial random population of equations evolved through Crossover (swapping sub-trees between parents) and Mutation (random structural changes).
Multi-Well Fitness Evaluation: Candidate models are evaluated simultaneously across multiple wells to avoid local overfitting.
Complexity Control: Tree depth and node counts are regulated to promote parsimony, ensuring the equations remain physically meaningful.

Figure 4. Flowchart diagram of typical genetic programming process

Depth-Resolved Implementation and Profiling Once a robust symbolic expression is derived, it is applied to the 0.1m-interval log data to calculate a theoretical oil contribution for every depth point. These localized predictions are integrated across all perforated intervals and scaled to match actual monthly production. This back-allocation process enables the construction of a cumulative depth-based production profile, providing the insight necessary for selective stimulation or re-perforation of unexploited zones.

Results

Historical Production Matching: The final evolved expressions demonstrated high predictive fidelity, achieving a strong match between predicted and measured oil rates and water cut across a diverse set of pilot wells (Figure 5, Figure 6).

Figure 5. Plots showing predicted vs. actual oil production rate over time for sample wells

Figure 6. Plots showing predicted vs. actual water cut overtime for sample wells

Depth-Resolved Profiling: Beyond surface-level rates, the framework successfully performed back-allocation to generate comprehensive depth-based production profiles. These visualizations (e.g., Figure 7, Figure 8) explicitly pinpointed high-contribution “sweet spots” within perforated segments, providing a level of detail rarely attainable without specialized zonal flow measurements.

Figure 7. Bar plots of depth-wise interval contributions, showing oil produced per interval (perforated only).

Figure 8. Bar plots of depth-wise interval contributions, showing oil rate per interval (all interval in reservoir)

To further validate the generalizability and practical utility of the framework, a blind test was conducted on an independent test well not included in the initial training sequence. The results demonstrated that the model accurately captured the predictive relationships between petrophysical log signatures and oil potential, with the identified “sweet spots” strictly aligning with favorable log responses such as low gamma ray and high resistivity (Figure 9).

Figure 9. A cross check between log responses vs. forecasted oil rate at missing pays

This successful blind-test validation underscores the model’s robustness and reliability as a diagnostic indicator. It confirms the system’s ability to provide high-fidelity insights into hydrocarbon-bearing zones, even in complex, commingled environments where traditional surveillance data is scarce.

Observations

Hydrocarbon Signatures: Analysis of the depth-wise contributions revealed that high-performing intervals consistently aligned with lower Gamma Ray (GR) values and higher Deep Resistivity (LLD) readings, reinforcing the model’s ability to recognize traditional petrophysical indicators of hydrocarbon-bearing zones.

Water rate prediction remains less accurate than oil, as inflow often originates from unlogged or bypassed horizons outside current perforated intervals. Furthermore, existing assumptions of bottom-up encroachment do not account for injection-well dynamics or inter-layer crossflow, necessitating the integration of more comprehensive reservoir data and injection parameters in future workflows.

Conclusions

Efficient Reservoir Surveillance: This research confirms that Symbolic Regression provides a lightweight, data-driven alternative to traditional hydrodynamic simulation, particularly for rapid re-perforation assessments in mature reservoirs.

Solving the “Missing Pay” Problem: The focus of future work is to expand this framework into a weakly supervised inference system specifically designed to identify overlooked oil-bearing intervals that remain unperforated.

Novel/Additive Information

The primary innovation lies in the transition from standalone symbolic regression to an orchestrated, multi-method workflow. Key additive features include:

Expanded Data Integration: Incorporating downhole pressure measurements, injection rates, and zonal well test results.
Relational Context & Dependencies: Utilizing model fusion (GNN, GP, time-series) to model strict well correlations, capturing neighborhood effects like offset-well interference and shared injection-well dynamics.

References

Koza, J. R. G. P. (1992). On the programming of computers by means of natural selection. Genetic programming.
Zhang, D., Tan, J., Yang, D., Mu, S., & Peng, Q. (2019). The residual potential of bottom water reservoir based upon genetic algorithm for the relative permeability inversion. Journal of Geoscience and Environment Protection, 7(4), 192-201.
Kim, H., & Kim, P. (2017). Reliability–redundancy allocation problem considering optimal redundancy strategy using parallel genetic algorithm. Reliability Engineering & System Safety, 159, 153-160.

RELATED ARTICLESMORE FROM AUTHOR

i2G Joins the Innovate Niagara Ecosystem

i2G Recognized and Accepted as a Member of Global Startups Accelerator, Toronto

Innovation Factory Recognizes i2G Solutions for Its AI-Driven Innovation and Participation in the Hamilton Startup Ecosystem

2nd EAGE-VPI Conference on Reservoir Geoscience

Collaboration Between i2G Solutions and University of Sharjah

RELATED ARTICLES MORE FROM AUTHOR