.. SPDX-License-Identifier: MIT

.. _cookbook/pgo:

===========================
Profile-Guided Optimisation
===========================

Profile-Guided Optimisation (PGO) is a two-phase process: first build
an instrumented binary, run it with a representative workload to collect
profile data, then rebuild with the profile data to produce an optimised
binary. LIBRA automates the compiler flags for both phases.

For the variable reference, see :cmake:variable:`LIBRA_PGO` in
:ref:`reference/variables`.

1. Add PGO presets
===================

.. code-block:: json

   {
     "configurePresets": [
       {
         "name": "pgo-gen",
         "inherits": "base",
         "cacheVariables": {
           "CMAKE_BUILD_TYPE": "Release",
           "LIBRA_PGO": "GEN"
         }
       },
       {
         "name": "pgo-use",
         "inherits": "base",
         "cacheVariables": {
           "CMAKE_BUILD_TYPE": "Release",
           "LIBRA_PGO": "USE"
         }
       }
     ],
     "buildPresets": [
       { "name": "pgo-gen", "configurePreset": "pgo-gen" },
       { "name": "pgo-use", "configurePreset": "pgo-use" }
     ]
   }

2. GEN phase — build and profile
==================================

Build the instrumented binary and run it with a representative
workload. The workload should cover the hot paths you want the
compiler to optimise — typically your benchmarks or a realistic
subset of your test suite.

.. tab-set::

   .. tab-item:: CLI

      .. code-block:: bash

         clibra build --preset pgo-gen
         ./build/pgo-gen/my_application --representative-workload

   .. tab-item:: CMake

      .. code-block:: bash

         cmake --preset pgo-gen
         cmake --build --preset pgo-gen
         ./build/pgo-gen/my_application --representative-workload

3. Merge profile data (Clang only)
====================================

GCC writes ``.gcda`` files directly in a form the USE phase can read.
Clang writes ``.profraw`` files that must be merged first:

.. code-block:: bash

   # Clang / Intel LLVM only
   llvm-profdata merge \
     -output=build/pgo-gen/default.profdata \
     build/pgo-gen/default*.profraw

If you have multiple ``.profraw`` files from different runs, merge
them all to produce a single ``.profdata``:

.. code-block:: bash

   llvm-profdata merge \
     -output=build/pgo-gen/merged.profdata \
     build/pgo-gen/*.profraw

4. USE phase — build the optimised binary
==========================================

.. tab-set::

   .. tab-item:: CLI

      .. code-block:: bash

         clibra build --preset pgo-use

   .. tab-item:: CMake

      .. code-block:: bash

         cmake --preset pgo-use
         cmake --build --preset pgo-use

The compiler reads the profile data from the GEN build directory
automatically (LIBRA passes the correct ``-fprofile-use=`` path). The
resulting binary in ``build/pgo-use/`` is tuned to the workload you
ran in the GEN phase.

.. note::

   For Clang, LIBRA passes ``-fprofile-use=build/pgo-gen/default.profdata``
   by default. If you merged to a different path, set
   :cmake:variable:`LIBRA_PGO_PROFILE_PATH` in your ``pgo-use`` preset's
   ``cacheVariables``.

5. Verify the improvement
==========================

Compare the instrumented and optimised binaries with your benchmark:

.. code-block:: bash

   # Instrumented binary (GEN phase — slower due to instrumentation)
   time ./build/pgo-gen/my_application --benchmark

   # Optimised binary (USE phase)
   time ./build/pgo-use/my_application --benchmark

Typical improvements are 5–20% for CPU-bound workloads. Memory-bound
workloads see smaller gains.

Common issues
=============

**"Profile data not found" during USE phase**
   The compiler looks for profile data relative to the build directory.
   Make sure the GEN binary was run from or wrote data to the expected
   location. For Clang, verify the ``.profdata`` file exists at
   ``build/pgo-gen/default.profdata``.

**"Profile data out of date" warnings**
   The source changed between the GEN and USE builds. The compiler
   falls back to non-PGO optimisation for affected functions. Re-run
   the GEN phase with the current source before rebuilding with USE.

**Low workload coverage**
   If the workload only exercises 20% of the code, PGO only helps that
   20%. Profile data from test suites tends to cover more code paths
   than a single benchmark run — consider running the full test suite
   as the GEN workload if individual benchmarks are insufficient.