{"id":606577,"date":"2025-07-14T17:11:24","date_gmt":"2025-07-14T22:11:24","guid":{"rendered":"https:\/\/towardsdatascience.com\/?p=606577"},"modified":"2025-07-14T17:12:20","modified_gmt":"2025-07-14T22:12:20","slug":"dynamic-inventory-optimization-with-censored-demand-2","status":"publish","type":"post","link":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/","title":{"rendered":"Dynamic Inventory Optimization with Censored Demand"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><mdspan datatext=\"el1752530776655\" class=\"mdspan-comment\">We often make decisions<\/mdspan> under uncertainty. Not just once, but in a sequence over time. We rely on our past experiences and expectations of the future to make the most informed and optimal choices possible.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Think of a business that offers several products. These products are procured at a cost and sold for a profit. However, unsold inventory may incur a restocking fee, may carry salvage value, or in some cases, must be scrapped entirely.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Businesses, therefore, faces a crucial question: how much to stock? This decision must often be made before demand is fully known; that is, under censored demand. If the business overstocks, it observes the full demand, since all customer requests are fulfilled. But if it understocks, it only sees that demand exceeded supply and the actual demand remains unknown, making it a censored observation.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/image-42-683x1024.png\" alt=\"\" class=\"wp-image-607781\" style=\"width:307px;height:auto\"\/><figcaption class=\"wp-element-caption\">Image Generated by Author via DALL-E<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">This type of problem is often referred to as a <strong>Newsvendor Model<\/strong>. In fields such as operations research and applied mathematics, the optimal stocking decision has been studied by framing it as a classic newspaper stocking problem; hence the name.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this article, we explore a <strong>Sequential Decision-Making<\/strong> framework for the stocking problem under uncertainty and develop a dynamic optimization algorithm using Bayesian learning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Our approach closely follows the framework laid out by <a href=\"https:\/\/castle.princeton.edu\/wp-content\/uploads\/2019\/10\/Powell-Reinforcement-Learning-and-Stochastic-Optimization.pdf\" data-type=\"link\" data-id=\"https:\/\/castle.princeton.edu\/wp-content\/uploads\/2019\/10\/Powell-Reinforcement-Learning-and-Stochastic-Optimization.pdf\">Warren B. Powell, Reinforcement Learning and Stochastic Optimization (2019)<\/a> and implements the paper from <a href=\"https:\/\/people.orie.cornell.edu\/pfrazier\/pub\/learning_newsvendor.pdf\" data-type=\"link\" data-id=\"https:\/\/people.orie.cornell.edu\/pfrazier\/pub\/learning_newsvendor.pdf\">Negoescu, Powell, and Frazier (2011), Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales<\/a>, published in Operations Research.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Problem Setup<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Following a similar setup to that of Negoescu et al., we frame the problem as optimizing the inventory level for a single item over  a sequence of time steps. The cost and selling price are considered fixed. Unsold inventory is discarded with no salvage value, while each unit sold generates revenue. Demand is unknown, and when the available stock is less than actual demand, the demand observation is considered censored. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Demand \\( W \\) in each period is drawn from an exponential distribution with an unknown rate parameter, for simulation purposes.<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\begin{aligned}<br \/>\nx &#038;\\in \\mathbb{R}_+ &#038;&#038;: \\text{Order quantity (decision variable)} \\\\<br \/>\nW &#038;\\sim \\mathrm{Exponential}(\\lambda) &#038;&#038;: \\text{Random demand with unknown rate parameter } \\lambda \\\\<br \/>\n\\lambda &#038;&#038;&#038;: \\text{Demand rate (unknown, to be estimated)} \\\\<br \/>\nc &#038;&#038;&#038;: \\text{Unit cost to procure or produce the item} \\\\<br \/>\np &#038;&#038;&#038;: \\text{Unit selling price (assume } p > c \\text{ for profitability)}<br \/>\n\\end{aligned}<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The parameter \\(\\lambda\\) in the exponential distribution represents the <strong>rate of demand<\/strong>; that is, how quickly demand events occur. The <strong>average demand<\/strong> is given by \\(\\mathbb{E}[W] = \\frac{1}{\\lambda}\\).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/image-95.png\" alt=\"\" class=\"wp-image-608818\" style=\"width:487px;height:auto\"\/><figcaption class=\"wp-element-caption\">Image by Author<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We can observe from the <strong>Probability Density Function<\/strong> (PDF) of the Exponential distribution that higher values of demand \\(W\\) become less likely. Thus, the Exponential distribution serves as an appropriate choice for demand modeling.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sequential Decision Formulation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We formulate the inventory control problem as sequential decision process under uncertainty. The goal is to maximize total expected profit over a finite time horizon \\( N \\), while learning unknown demand rate by applying Bayesian Learning principals.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We define a model with an initial state and a probabilistic model that represents its belief about future states over time. At each time step, the model makes a decision based on a policy that maps its current belief to an action. The goal is to find the <strong>optimal policy<\/strong> that maximizes a predefined reward function.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After taking an action, the model observes the resulting state and updates its belief, accordingly, continuing this cycle of decision, observation, and belief update.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/diagram.drawio-2.png\" alt=\"\" class=\"wp-image-608398\" style=\"width:246px;height:auto\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">1) State Variable<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We model demand in each period as a random variable drawn from an Exponential distribution with an unknown rate parameter \u03bb. Since \\( \\lambda \\) is not directly observable, we encode our uncertainty about its value using a Gamma prior: <\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\lambda \\sim \\mathrm{Gamma}(a_0, b_0)<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The parameters \\( a_0 \\) and \\( b_0 \\) define the shape and rate of our initial belief about the demand rate. These two parameters serve as our <strong>state variables<\/strong>. At each time step, they summarize all past information and are updated as new demand observations become available.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As we collect more data, the posterior distribution over \\(lambda\\) evolves from a wide and uncertain shape to a narrower and more confident one, gradually concentrating around the true demand rate. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This process is captured naturally by the Gamma distribution, which flexibly adjusts its shape based on the amount of information we&#8217;ve seen. Early on, the distribution is diffuse, signaling high uncertainty. As observations accumulate, the belief becomes sharper, allowing for more reliable and responsive decision-making. Probability Density Function (PDF) of Gamma distribution can be seen below:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/image-96.png\" alt=\"\" class=\"wp-image-608820\" style=\"width:470px;height:auto\"\/><figcaption class=\"wp-element-caption\">Image by Author<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We will later define a <strong>transition function<\/strong> that updates the state, that is, how \\( (a_n, b_n) \\) evolves to \\( (a_{n+1}, b_{n+1}) \\), based on newly observed data. This allows the model to continuously refine its belief about demand and make more informed inventory decisions over time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note that expected value of the Gamma distribution is defined as:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\mathbb{E}[\\lambda] = \\frac{a}{b}<br \/>\n\\]<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2) Decision Variable<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The <strong>decision variable<\/strong> at time \\( n \\) is the stocking level:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nx_n \\in \\mathbb{R}_+<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is the number of units to order before demand \\( W_{n+1} \\) is realized. The decision depends only on the current belief \\( (a_n, b_n) \\).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3) Exogenous Information<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">After selecting \\( x_n \\), demand \\( W_{n+1} \\) is revealed:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nW_{n+1} \\sim \\text{Exp}(\\lambda)<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Since \\( \\lambda \\) is unknown, demand is random. Observations are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><strong>Uncensored<\/strong> if \\( W_{n+1} &lt; x_n \\) (we observe the actual demand)<\/li>\n\n\n\n<li class=\"wp-block-list-item\"><strong>Censored<\/strong> if \\( W_{n+1} \\ge x_n \\) (we only know demand exceeding supply levels)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This censoring limits the information available for belief updating. Even though the full demand isn\u2019t observed, the censored observation still carries valuable information and should not be ignored in our modeling approach.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4) Transition Function<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The transition function defines how the model\u2019s belief, represented by the state variables, is updated over time. It maps the prior state to the expected future state, and in our case, this update is governed by Bayesian learning.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Bayesian Uncertainty Modelling<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Bayes\u2019 theorem combines prior belief with observed data to form a posterior distribution. This updated distribution reflects both prior knowledge and the newly observed information.<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\np_{n+1}(\\lambda \\mid w_{n+1}) = \\frac{p(w_{n+1} \\mid \\lambda) \\cdot p_n(\\lambda)}{p(w_{n+1})}<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Where:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\np(w_{n+1} \\mid \\lambda) : \\text{ Likelihood of new observation at time } n+1<br \/>\n\\]<\/p>\n\\[<br \/>\np_n(\\lambda) : \\text{ Prior at time } n<br \/>\n\\]\n\\[<br \/>\np(w_{n+1}) : \\text{ Marginal likelihood (normalizing constant) at time } n+1<br \/>\n\\]\n\\[<br \/>\np_{n+1}(\\lambda \\mid w_{n+1}) : \\text{ Posterior after observing } w_{n+1}<br \/>\n\\]\n\n\n\n<p class=\"wp-block-paragraph\">We set up our problem such that in each period, demand W is drawn from an Exponential distribution. Prior belief over \u03bb will be modelled using a Gamma distribution.<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\np_{n+1}(\\lambda \\mid w_{n+1})<br \/>\n=<br \/>\n\\frac{<br \/>\n\\underbrace{\\lambda e^{-\\lambda w_{n+1}}}_{\\text{Likelihood}}<br \/>\n\\cdot<br \/>\n\\underbrace{\\frac{b_n^{a_n}}{\\Gamma(a_n)} \\lambda^{a_n &#8211; 1} e^{-b_n \\lambda}}_{\\text{Prior (Gamma)}}<br \/>\n}{<br \/>\n\\underbrace{<br \/>\n\\int_0^\\infty \\lambda e^{-\\lambda w_{n+1}} \\cdot \\frac{b_n^{a_n}}{\\Gamma(a_n)} \\lambda^{a_n &#8211; 1} e^{-b_n \\lambda} \\, d\\lambda<br \/>\n}_{\\text{Marginal (evidence)}}<br \/>\n}<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The Gamma and Exponential distributions form a well-known <strong>conjugate prior<\/strong> in Bayesian statistics. When using a Gamma prior and an Exponential likelihood, the resulting posterior is also a Gamma distribution. This property of the prior and posterior belonging to the same distributional family is what defines a conjugate prior. Posterior also belongs to the Gamma family; a property that simplifies Bayesian updating significantly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For reference, closed-form conjugate updates like this can be found in standard conjugate prior tables, such as the one on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Conjugate_prior\" target=\"_blank\" rel=\"noopener\">Wikipedia<\/a>. Using this reference, we can formulate the posterior as:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\lambda \\mid w_1, \\dots, w_n \\sim \\mathrm{Gamma}\\left(a_0 + n,\\ b_0 + \\sum_{i=1}^n w_i\\right)<br \/>\n\\]<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\lambda \\sim \\mathrm{Gamma}(a_0, b_0) \\quad : \\text{ Prior}<br \/>\n\\]<\/p>\n\\[<br \/>\nw \\sim \\mathrm{Exp}(\\lambda) \\quad : \\text{ Likelihood}<br \/>\n\\]\n\n\n\n<p class=\"wp-block-paragraph\">For <em>n<\/em> independent observations \\( w_1, \\dots, w_n \\), the Gamma prior and Exponential likelihood result in a Gamma posterior:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After observing a single (uncensored) demand \\( w \\), the posterior simplifies to below by leveraging conjugate priors:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\lambda \\mid w \\sim \\mathrm{Gamma}(a_0 + 1,\\ b_0 + w)<br \/>\n\\]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">The shape parameter increases by <strong>1<\/strong> because one new data point has been observed.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">The rate parameter increases by <strong>\\( w \\)<\/strong> because the Exponential likelihood includes the term \\( e^{-\\lambda w} \\), which combines with the prior\u2019s exponential term and adds to the total exponent.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">The Update Function<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The posterior parameters (state variables) are updated based on the nature of the observation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><strong>Uncensored<\/strong> (\\(  W_{n+1} &lt; x_n \\)):<\/li>\n<\/ul>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n  a_{n+1} = a_n + 1, \\quad b_{n+1} = b_n + W_{n+1}<br \/>\n\\]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><strong>Censored<\/strong> (\\( W_{n+1} \\ge x_n \\)):<\/li>\n<\/ul>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n  a_{n+1} = a_n, \\quad b_{n+1} = b_n + x_{n}<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These updates reflect how each observation, full or partial, informs the posterior belief over \\( \\lambda \\).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can define the transition function in Python as below:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from typing import Tuple\n\ndef transition_a_b(\n    a_n: float,\n    b_n: float,\n    x_n: float,\n    W_n1: float\n) -&gt; Tuple[float, float]:\n    &quot;&quot;&quot;\n    Updates the posterior parameters (a, b) after observing demand.\n\n    Args:\n        a_n (float): Current shape parameter of Gamma prior.\n        b_n (float): Current rate parameter of Gamma prior.\n        x_n (float): Order quantity at time n.\n        W_n1 (float): Observed demand at time n+1 (may be censored).\n\n    Returns:\n        Tuple[float, float]: Updated (a_{n+1}, b_{n+1}) values.\n    &quot;&quot;&quot;\n    if W_n1 &lt; x_n:\n        # Uncensored: full demand observed\n        a_n1 = a_n + 1\n        b_n1 = b_n + W_n1\n    else:\n        # Censored: only know that W &gt;= x\n        a_n1 = a_n\n        b_n1 = b_n + x_n\n\n    return a_n1, b_n1<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">5) Objective Function<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The model seeks a policy \\( \\pi \\), mapping beliefs to stocking decisions in order to maximize total expected profit.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Profit from ordering \\( x_n \\) units and facing demand \\( W_{n+1} \\):<\/li>\n<\/ul>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nF(x_n, W_{n+1}) = p \\cdot \\min(x_n, W_{n+1}) &#8211; c \\cdot x_n<br \/>\n\\]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">The cumulative objective is:<\/li>\n<\/ul>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\max_\\pi \\mathbb{E} \\left[ \\sum_{n=0}^{N-1} F(x_n, W_{n+1}) \\right]<br \/>\n\\]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\\( \\pi \\) maps \\( (a_n, b_n) \\) to \\( x_n \\)<\/li>\n\n\n\n<li class=\"wp-block-list-item\">\\( p \\) is the selling price per unit sold<\/li>\n\n\n\n<li class=\"wp-block-list-item\">\\( c \\) is the unit cost of ordering<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Unsold units are discarded with no salvage value<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Note that this objective function maximizes only the expected immediate reward across the entire time horizon. In the next section, we introduce an expanded version that incorporates the value of future learning. This encourages the model to explore, accounting for the information that censored demand can reveal over time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can define the profit function in Python as below:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def profit_function(x: float, W: float, p: float, c: float) -&gt; float:\n    &quot;&quot;&quot;\n    Profit function defined as:\n\n        F(x, W) = p * min(x, W) - c * x\n\n    This represents the reward received when fulfilling demand W with inventory x,\n    earning price p per unit sold and incurring cost c per unit ordered.\n\n    Args:\n        x (float): Inventory level \/ decision variable.\n        W (float): Realized demand.\n        p (float, optional): Unit selling price.\n        c (float, optional): Unit cost.\n\n    Returns:\n        float: The profit (reward) for this period.\n    &quot;&quot;&quot;\n    return p * min(x, W) - c * x<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Policy Functions<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We will define several policy functions as defined by Negoescu et al, which will update the value of \\(x_{n+1}\\) (stocking level) based on our current belief of the state \\((a_{n}, b_{n})\\). <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Point Estimate Policy<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Under this policy, model estimates the unknown demand rate \\(\\lambda\\) using the current posterior and chooses and order quantity \\( x_{n+1} \\) to maximize the immediate expected profit.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At time \\(n\\), current posterior about \\(\\lambda ~ Gamma(a_{n}, b_{n})\\) is:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\hat{\\lambda}_n = \\frac{a_n}{b_n}<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We treat this estimate as the &#8220;true&#8221; value of \\(\\lambda\\) and assume demand \\(W \\sim \\text{Exp}(\\hat{\\lambda}_n)\\).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Expected Value<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The profit for order quantity \\(x\\) and realized demand \\(W\\) is:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nF(x, W) = p \\cdot \\min(x, W) &#8211; c \\cdot x<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We seek to maximize the expected profit. <\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\max_{x \\geq 0} \\quad \\mathbb{E}_W \\left[ p \\min(x, W) &#8211; c x \\right]<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Expected value of a random variable is:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\mathbb{E}[X] = \\int_{-\\infty}^{\\infty} x \\cdot f(x) \\, dx<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Thus, the objective function can be written as:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\max_{x \\geq 0} \\left[ p \\left( \\int_0^x w f_W(w) \\, dw + x \\int_x^\\infty f_W(w) \\, dw \\right) &#8211; c x \\right]<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\\(f_W(x)\\): Probability density function (PDF) of demand evaluated at \\(x\\)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">PDF of \\(Exponential(\\lambda)\\) is:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nf_W(w) = \\hat{\\lambda}_n e^{-\\hat{\\lambda}_n w}<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This can be solved as:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\mathbb{E}[F(x, W)] = p \\cdot \\frac{1 &#8211; e^{-\\hat{\\lambda}_n x}}{\\hat{\\lambda}_n} &#8211; c x<br \/>\n\\]<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">First Order Optimality Condition<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">We set the derivative of the expected profit function to zero, and solve for \\(x\\) to find the stocking value which maximize the expected profit:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\frac{d}{dx} \\mathbb{E}[F(x, W)] = p e^{-\\hat{\\lambda}_n x} &#8211; c = 0<br \/>\n\\]<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\ne^{-\\hat{\\lambda}_n x^*} = \\frac{c}{p}<br \/>\n\\quad \\Rightarrow \\quad<br \/>\nx^* = \\frac{1}{\\hat{\\lambda}_n} \\log\\left( \\frac{p}{c} \\right)<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Substitute \\(\\hat{\\lambda}_n = \\frac{a_n}{b_n}\\):<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nx_n = \\frac{b_n}{a_n} \\log\\left( \\frac{p}{c} \\right)<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Python implementation:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import math\n\ndef point_estimate_policy(\n    a_n: float,\n    b_n: float,\n    p: float,\n    c: float\n) -&gt; float:\n    &quot;&quot;&quot;\n    Point Estimate Policy, chooses x_n based on posterior mean at time n.\n\n    Args:\n        a_n (float): Gamma shape parameter at time n.\n        b_n (float): Gamma rate parameter at time n.\n        p (float): Selling price per unit.\n        c (float): Unit cost.\n\n    Returns:\n        float: Stocking level x_n\n    &quot;&quot;&quot;\n    lambda_hat = a_n \/ b_n\n    return (1 \/ lambda_hat) * math.log(p \/ c)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">2) Distribution Policy<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The Distribution Policy optimizes the expected immediate profit by integrating over the entire current belief distribution of the demand rate \\(\\lambda\\). Unlike the Point Estimate Policy, it does not collapse the posterior to a single value.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At time \\(n\\), the belief about \\(\\lambda\\) is:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\lambda \\sim \\text{Gamma}(a_n, b_n)<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Demand is modelled as:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nW \\sim \\text{Exp}(\\lambda)<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This policy chooses order quantity \\(x_{n}\\) by maximizing the expected immediate profit, averaged over both the uncertainty in demand and the uncertainty in \\(\\lambda\\)<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nx_n = \\arg\\max_{x \\ge 0} \\ \\mathbb{E}_{\\lambda \\sim \\text{Gamma}(a_n, b_n)} \\left[ \\mathbb{E}_{W \\sim \\text{Exp}(\\lambda)} \\left[ p \\cdot \\min(x, W) &#8211; c x \\right] \\right]<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is the expected immediate profit, averaged over both the uncertainty in demand <strong>and<\/strong> the uncertainty in \\(\\lambda\\).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Expected Value<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">From the previous policy, we know that:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\mathbb{E}_W[\\min(x, W)] =  \\frac{1 &#8211; e^{-\\hat{\\lambda}_n x}}{\\hat{\\lambda}_n}<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Thus:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\mathbb{E}_{\\lambda} \\left[ \\mathbb{E}_{W \\mid \\lambda}[\\min(x, W)] \\right]<br \/>\n= \\mathbb{E}_{\\lambda} \\left[ \\frac{1 &#8211; e^{-\\lambda x}}{\\lambda} \\right]<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If we denote the Gamma density as:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nf(\\lambda) = \\frac{b^a}{\\Gamma(a)} \\lambda^{a &#8211; 1} e^{-b \\lambda}<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Then expectation becomes:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\mathbb{E}_\\lambda \\left[ \\frac{1 &#8211; e^{-\\lambda x}}{\\lambda} \\right]<br \/>\n=\\int_0^\\infty \\frac{1 &#8211; e^{-\\lambda x}}{\\lambda} f(\\lambda) \\, d\\lambda<br \/>\n= \\frac{b^a}{\\Gamma(a)} \\int_0^\\infty (1 &#8211; e^{-\\lambda x}) \\lambda^{a &#8211; 2} e^{-b \\lambda} \\, d\\lambda<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Without going over the full proof, expectation becomes:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\mathbb{E}[\\text{Profit}] = p \\cdot \\mathbb{E}_{\\lambda} \\left[ \\frac{1 &#8211; e^{-\\lambda x}}{\\lambda} \\right] &#8211; c x<br \/>\n= p \\cdot \\frac{b}{a &#8211; 1} \\left(1 &#8211; \\left( \\frac{b}{b + x} \\right)^{a &#8211; 1} \\right) &#8211; c x<br \/>\n\\]<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">First Order Optimality Condition<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Again, we set the derivative of the expected profit function to zero, and solve for \\(x\\) to find the stocking value which maximize the expected profit:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\frac{d}{dx} \\mathbb{E}[\\text{Profit}]<br \/>\n= \\frac{d}{dx} \\left[ p \\cdot \\frac{b}{a &#8211; 1} \\left(1 &#8211; \\left( \\frac{b}{b + x} \\right)^{a &#8211; 1} \\right) &#8211; c x \\right] = 0<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Without going over the proof, closed form expresion based on Negoescu et al&#8217;s paper is:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nx_n = b_n \\left( \\left( \\frac{p}{c} \\right)^{1\/a_n} &#8211; 1 \\right)<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Python implementation:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def distribution_policy(\n    a_n: float,\n    b_n: float,\n    p: float,\n    c: float\n) -&gt; float:\n    &quot;&quot;&quot;\n    Distribution Policy, chooses x_n by integrating over full posterior at time n.\n\n    Args:\n        a_n (float): Gamma shape parameter at time n.\n        b_n (float): Gamma rate parameter at time n.\n        p (float): Selling price per unit.\n        c (float): Unit cost.\n\n    Returns:\n        float: Stocking level x_n\n    &quot;&quot;&quot;\n    return b_n * ((p \/ c) ** (1 \/ a_n) - 1)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Knowledge Gradient (KG) Policy<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The Knowledge Gradient (KG) policy is a Bayesian learning policy that balances <em>exploitation<\/em> (maximizing immediate profit) and <em>exploration<\/em> (ordering to gain information about demand for future decisions).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Instead of just maximizing today&#8217;s profit, KG chooses the order quantity that maximizes:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Profit now + Value of information gained for the future<\/strong><\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nx_n = \\arg\\max_x \\ \\mathbb{E}\\left[ p \\cdot \\min(x, W_{n+1}) &#8211; c x + V(a_{n+1}, b_{n+1}) \\mid a_n, b_n, x \\right]<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\\(W_{n+1} \\sim \\text{Exp}(\\lambda)\\) (with \\(\\lambda \\sim \\text{Gamma}(a_n, b_n)\\))<\/li>\n\n\n\n<li class=\"wp-block-list-item\">\\(V(a_{n+1}, b_{n+1})\\) is the value of expected future profits under updated beliefs after observing \\(W_{n+1}\\)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">We do not know \\(a_{n+1}, b_{n+1}\\) at time \\(n\\) because we haven\u2019t yet observed demand. So, we compute their expected value under the possible observation outcomes (censored vs uncensored).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The KG policy then evaluates each candidate stocking quantity \\(x\\) by:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Simulating its effect on posterior beliefs<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Computing the immediate profit<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Computing the value of future learning based on belief updates<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Objective Function<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">We define the total value of choosing \\(x\\) at time \\(n\\) as:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nF_{\\text{KG}}(x) = \\underbrace{\\mathbb{E}[p \\cdot \\min(x, W) &#8211; c x]}_{\\text{Immediate profit}} + \\underbrace{(N &#8211; n) \\cdot \\mathbb{E}_{\\text{posterior}} \\left[ \\max_{x&#8217;} \\mathbb{E}_{\\lambda \\sim \\text{posterior}}[ p \\cdot \\min(x&#8217;, W) &#8211; c x&#8217; ] \\right]}_{\\text{Value of learning}}<br \/>\n\\]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">The first term is just expected immediate profit.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">The second term accounts for how this choice improves future profit by sharpening our belief about \\(\\lambda\\).<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Horizon Factor \\((N-n)\\): We will make \\((N-n)\\) more decisions in the future. So the value of better decisions due to learning today gets multiplied by this factor.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Posterior Averaging \\(\\mathbb{E}_{\\text{posterior}}[\u22c5]\\): This means we are averaging over all the possible posterior beliefs we might end up with after observing the outcome of demand; because demand is random and possibly censored, we won\u2019t get perfect information, but we will update our belief.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The paper uses previously discussed Distribution Policy as proxy for estimating the Future Value function. Thus:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nx^*(a, b) = V(a, b) = \\frac{b p}{a &#8211; 1} \\left( 1 &#8211; \\left( \\frac{b}{b + x^*} \\right)^{a &#8211; 1} \\right) &#8211; c x^* = b \\left( \\left( \\frac{p}{c} \\right)^{1\/a} &#8211; 1 \\right)<br \/>\n\\]<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Expected Value<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Expected value of \\(V\\) is expressed as below per Negoescu et al. As the proof of this equation is quite complex, we&#8217;ll not be going over the details. <\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\begin{align*}<br \/>\n\\mathbb{E}[V] &#038;= \\mathbb{E} \\left[ \\mathbb{E} \\left[ b^{n+1} \\left( \\frac{p}{a^{n+1} &#8211; 1} \\left( 1 &#8211; \\left( \\frac{c}{p} \\right)^{1 &#8211; \\frac{1}{a^{n+1}}} \\right) &#8211; c \\left( \\left( \\frac{c}{p} \\right)^{-\\frac{1}{a^{n+1}}} &#8211; 1 \\right) \\right) \\Big| \\lambda \\right] \\Big| a^n, b^n, x^n \\right] \\\\<br \/>\n&#038;= \\mathbb{E} \\left[ \\int_0^{x^n} \\left( b^n + y \\right) \\left( \\frac{p}{a^n} \\left( 1 &#8211; \\left( \\frac{c}{p} \\right)^{1 &#8211; \\frac{1}{a^{n+1}}} \\right) &#8211; c \\left( \\left( \\frac{c}{p} \\right)^{-\\frac{1}{a^{n+1}}} &#8211; 1 \\right) \\right) \\lambda e^{-\\lambda y} \\, dy \\right. \\\\<br \/>\n&#038;\\quad + \\left. \\int_{x^n}^{\\infty} \\left( b^n + x^n \\right) \\left( \\frac{p}{a^n &#8211; 1} \\left( 1 &#8211; \\left( \\frac{c}{p} \\right)^{1 &#8211; \\frac{1}{a^n}} \\right) &#8211; c \\left( \\left( \\frac{c}{p} \\right)^{-\\frac{1}{a^n}} &#8211; 1 \\right) \\right) \\lambda e^{-\\lambda y} \\, dy \\right].<br \/>\n\\end{align*}<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As we already know the expected value of the immediate profit function as described under previous policies, we can express the additive expected value of KG policy as summation. As this equation is quite long, we&#8217;ll not be going over the details, but it can be found in the paper.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">First Order Optimality Condition<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">In this policy as well, we set the derivative of the expected profit function to zero, and solve for \\(x\\) to find the stocking value which maximize the expected profit. Closed form solution of the equation based on the paper is:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\nx_n = b_n \\left[ \\left( \\frac{r}{1 + (N &#8211; n) \\cdot \\left( 1 + \\frac{a_n r}{a_n &#8211; 1} &#8211; \\frac{(a_n + 1) r}{a_n} \\right)} \\right)^{-1 \/ a_n} &#8211; 1 \\right]<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\\(r = \\frac{c}{p}\\): Cost to price ratio<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Python implementation:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def knowledge_gradient_policy(\n    a_n: float,\n    b_n: float,\n    p: float,\n    c: float,\n    n: int,\n    N: int\n) -&gt; float:\n    &quot;&quot;&quot;\n    Knowledge Gradient Policy, one-step lookahead policy for exponential demand\n    with Gamma(a_n, b_n) posterior.\n\n    Args:\n        a_n (float): Gamma shape parameter at time n.\n        b_n (float): Gamma rate parameter at time n.\n        p (float): Selling price per unit.\n        c (float): Unit cost per unit.\n        n (int): Current period index (0-based).\n        N (int): Total number of periods in the horizon.\n\n    Returns:\n        float: Stocking level x_n\n    &quot;&quot;&quot;\n    a = max(a_n, 1.001)  # Avoid division by zero for small shape values\n    r = c \/ p\n\n    future_factor = (N - (n + 1)) \/ N\n    adjustment = 1.0 - future_factor * (1.0 \/ a)\n    adjusted_r = min(max(r * adjustment, 1e-4), 0.99)\n\n    return b_n * ((1 \/ adjusted_r) ** (1 \/ a) - 1)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Monte Carlo Policy Evaluation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To evaluate a policy \\(\\pi\\) in a stochastic environment, we simulate its performance over multiple sample demand paths.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\\(M\\) be the number of independent simulations (demand paths), each denoted \\(\\omega^m\\) for \\(m = 1, 2, \\dots, M\\)<\/li>\n\n\n\n<li class=\"wp-block-list-item\">\\(N\\) be the time horizon<\/li>\n\n\n\n<li class=\"wp-block-list-item\">\\(p_n(\\omega^m)\\)  be the simulated price at time \\(n\\)  on path \\(m\\) <\/li>\n\n\n\n<li class=\"wp-block-list-item\">\\(x_n(\\omega^m)\\) be the decision taken at time \\(n\\) under policy \\(\\pi\\) on path \\(n\\)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Cumulative Reward on a Single Path<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">For each sample path \\(\\omega^m\\), compute the total reward:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\hat{F}^\\pi(\\omega^m) = \\sum_{n=0}^{N-1} \\left[ p \\cdot \\min\\left(x_n(\\omega^m), W_{n+1}(\\omega^m)\\right) &#8211; c \\cdot x_n(\\omega^m) \\right]<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This represents the realized value of the policy \\(\\pi\\) along that specific trajectory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Python implementation:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import numpy as np\n\ndef simulate_policy(\n    N: int,\n    a_0: float,\n    b_0: float,\n    lambda_true: float,\n    policy_name: str,\n    p: float,\n    c: float,\n    seed: int = 42\n) -&gt; float:\n    &quot;&quot;&quot;\n    Simulates the sequential inventory decision-making process using a specified policy.\n\n    Args:\n        N (int): Number of time periods.\n        a_0 (float): Initial shape parameter of Gamma prior.\n        b_0 (float): Initial rate parameter of Gamma prior.\n        lambda_true (float): True exponential demand rate.\n        policy_name (str): One of {&#039;point_estimate&#039;, &#039;distribution&#039;, &#039;knowledge_gradient&#039;}.\n        p (float): Selling price per unit.\n        c (float): Procurement cost per unit.\n        seed (int): Random seed for reproducibility.\n\n    Returns:\n        float: Total cumulative reward over N periods.\n    &quot;&quot;&quot;\n    np.random.seed(seed)\n    a_n, b_n = a_0, b_0\n    rewards = []\n\n    for n in range(N):\n        # Choose order quantity based on specified policy\n        if policy_name == &quot;point_estimate&quot;:\n            x_n = point_estimate_policy(a_n=a_n, b_n=b_n, p=p, c=c)\n        elif policy_name == &quot;distribution&quot;:\n            x_n = distribution_policy(a_n=a_n, b_n=b_n, p=p, c=c)\n        elif policy_name == &quot;knowledge_gradient&quot;:\n            x_n = knowledge_gradient_policy(a_n=a_n, b_n=b_n, p=p, c=c, n=n, N=N)\n        else:\n            raise ValueError(f&quot;Unknown policy: {policy_name}&quot;)\n\n        # Sample demand\n        W_n1 = np.random.exponential(1 \/ lambda_true)\n\n        # Compute profit and update belief\n        reward = profit_function(x_n, W_n1, p, c)\n        rewards.append(reward)\n\n        a_n, b_n = transition_a_b(a_n, b_n, x_n, W_n1)\n\n    return sum(rewards)<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Estimate Expected Value by Averaging<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The expected reward of policy \\(\\pi\\) is approximated using the sample average across all \\(M\\) simulations:<\/p>\n\n\n<p class=\"wp-block-shortcode\">\\[<br \/>\n\\bar{F}^\\pi = \\frac{1}{N} \\sum_{m=1}^{N} \\hat{F}^\\pi(\\omega^m)<br \/>\n\\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This \\(\\bar{F}^\\pi\\) is an unbiased estimator of the true expected reward under policy \\(\\pi\\).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Python implementation:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\" datatext=\"\"><code class=\"language-python\">import numpy as np\n\ndef policy_monte_carlo(\n    N_sim: int,\n    N: int,\n    a_0: float,\n    b_0: float,\n    lambda_true: float,\n    policy_name: str,\n    p: float = 10.0,\n    c: float = 4.0,\n    base_seed: int = 42\n) -&gt; float:\n    &quot;&quot;&quot;\n    Runs multiple Monte Carlo simulations to evaluate the average cumulative reward\n    for a given inventory policy under exponential demand.\n\n    Args:\n        N_sim (int): Number of Monte Carlo simulations to run.\n        N (int): Number of time steps in each simulation.\n        a_0 (float): Initial Gamma shape parameter.\n        b_0 (float): Initial Gamma rate parameter.\n        lambda_true (float): True rate of exponential demand.\n        policy_name (str): Name of the policy to use: {&quot;point_estimate&quot;, &quot;distribution&quot;, &quot;knowledge_gradient&quot;}.\n        p (float): Selling price per unit.\n        c (float): Procurement cost per unit.\n        base_seed (int): Seed offset for reproducibility across simulations.\n\n    Returns:\n        float: Average cumulative reward across all simulations.\n    &quot;&quot;&quot;\n    total_rewards = []\n\n    for i in range(N_sim):\n        reward = simulate_policy(\n            N=N,\n            a_0=a_0,\n            b_0=b_0,\n            lambda_true=lambda_true,\n            policy_name=policy_name,\n            p=p,\n            c=c,\n            seed=base_seed + i\n        )\n        total_rewards.append(reward)\n\n    return np.mean(total_rewards)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Parameters\nN_sim = 10000 # Number of simulations\nN = 100 # Number of time periods\na_0 = 10.0 # Initial shape parameter of Gamma prior\nb_0 = 5.0 # Initial rate parameter of Gamma prior\nlambda_true = 0.25 # True rate of exponential demand\np = 26.0 # Selling price per unit\nc = 20.0 # Unit cost\nbase_seed = 1234 # Base seed for reproducibility\n\nresults = {\n    policy: policy_monte_carlo(\n        N_sim=N_sim,\n        N=N,\n        a_0=a_0,\n        b_0=b_0,\n        lambda_true=lambda_true,\n        policy_name=policy,\n        p=p,\n        c=c,\n        base_seed=base_seed\n    )\n    for policy in [&quot;point_estimate&quot;, &quot;distribution&quot;, &quot;knowledge_gradient&quot;]\n}\n\nprint(results)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Results<\/h3>\n\n\n<div style=\"display: flex; gap: 10px;\">\n<figure class=\"wp-block-image size-full\" style=\"flex: 1;\"><img decoding=\"async\" class=\"wp-image-608831\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/image-97.png\" alt=\"\" \/><\/figure>\n<figure class=\"wp-block-image size-full\" style=\"flex: 1;\"><img decoding=\"async\" class=\"wp-image-608833\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/image-99.png\" alt=\"\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">The left plot shows how average cumulative profit evolves over time, while the right plot shows the average reward per time step. From this simulation, we observe that the Knowledge Gradient (KG) policy significantly outperforms the other two, as it optimizes not only immediate rewards but also the future value of cumulative rewards. Both the Point Estimate and Distribution policies perform similarly.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/image-102-1024x606.png\" alt=\"\" class=\"wp-image-608837\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We can observe from above plots that the Bayesian Learning algorithm gradually converges to the true mean demand \\(W\\).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These findings highlight the importance of incorporating the value of information in sequential decision making under uncertainty. While simpler heuristics like the Point Estimate and Distribution policies focus solely on immediate gains, the Knowledge Gradient policy leverages future learning potential, yielding superior long-term performance. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>A sequential decision framework with Bayesian learning<\/p>\n","protected":false},"author":18,"featured_media":606578,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"is_member_only":false,"sub_heading":"A sequential decision framework with Bayesian learning","footnotes":""},"categories":[44],"tags":[10579,468,2506,746,8131],"sponsor":[],"coauthors":[30469],"class_list":["post-606577","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","tag-bayesian-learning","tag-deep-dives","tag-demand-forecasting","tag-reinforcement-learning","tag-supply-chain-analytics"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Dynamic Inventory Optimization with Censored Demand | Towards Data Science<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Dynamic Inventory Optimization with Censored Demand | Towards Data Science\" \/>\n<meta property=\"og:description\" content=\"A sequential decision framework with Bayesian learning\" \/>\n<meta property=\"og:url\" content=\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/\" \/>\n<meta property=\"og:site_name\" content=\"Towards Data Science\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-14T22:11:24+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-14T22:12:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1536\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Mert Ersoz\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:site\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Mert Ersoz\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/\"},\"author\":{\"name\":\"TDS Editors\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\"},\"headline\":\"Dynamic Inventory Optimization with Censored Demand\",\"datePublished\":\"2025-07-14T22:11:24+00:00\",\"dateModified\":\"2025-07-14T22:12:20+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/\"},\"wordCount\":3630,\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg\",\"keywords\":[\"Bayesian Learning\",\"Deep Dives\",\"Demand Forecasting\",\"Reinforcement Learning\",\"Supply Chain Analytics\"],\"articleSection\":[\"Data Science\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/\",\"url\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/\",\"name\":\"Dynamic Inventory Optimization with Censored Demand | Towards Data Science\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg\",\"datePublished\":\"2025-07-14T22:11:24+00:00\",\"dateModified\":\"2025-07-14T22:12:20+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#primaryimage\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg\",\"width\":1536,\"height\":1024,\"caption\":\"Image Generated by Author via DALL-E\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/towardsdatascience.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Dynamic Inventory Optimization with Censored Demand\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/towardsdatascience.com\/#website\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"name\":\"Towards Data Science\",\"description\":\"Publish AI, ML &amp; data-science insights to a global community of data professionals.\",\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"alternateName\":\"TDS\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/towardsdatascience.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/towardsdatascience.com\/#organization\",\"name\":\"Towards Data Science\",\"alternateName\":\"TDS\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"width\":696,\"height\":696,\"caption\":\"Towards Data Science\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/TDataScience\",\"https:\/\/www.youtube.com\/c\/TowardsDataScience\",\"https:\/\/www.linkedin.com\/company\/towards-data-science\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\",\"name\":\"TDS Editors\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"caption\":\"TDS Editors\"},\"description\":\"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds\",\"url\":\"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Dynamic Inventory Optimization with Censored Demand | Towards Data Science","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/","og_locale":"en_US","og_type":"article","og_title":"Dynamic Inventory Optimization with Censored Demand | Towards Data Science","og_description":"A sequential decision framework with Bayesian learning","og_url":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/","og_site_name":"Towards Data Science","article_published_time":"2025-07-14T22:11:24+00:00","article_modified_time":"2025-07-14T22:12:20+00:00","og_image":[{"width":1536,"height":1024,"url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg","type":"image\/jpeg"}],"author":"Mert Ersoz","twitter_card":"summary_large_image","twitter_creator":"@TDataScience","twitter_site":"@TDataScience","twitter_misc":{"Written by":"Mert Ersoz","Est. reading time":"16 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#article","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/"},"author":{"name":"TDS Editors","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee"},"headline":"Dynamic Inventory Optimization with Censored Demand","datePublished":"2025-07-14T22:11:24+00:00","dateModified":"2025-07-14T22:12:20+00:00","mainEntityOfPage":{"@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/"},"wordCount":3630,"publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"image":{"@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg","keywords":["Bayesian Learning","Deep Dives","Demand Forecasting","Reinforcement Learning","Supply Chain Analytics"],"articleSection":["Data Science"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/","url":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/","name":"Dynamic Inventory Optimization with Censored Demand | Towards Data Science","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#primaryimage"},"image":{"@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg","datePublished":"2025-07-14T22:11:24+00:00","dateModified":"2025-07-14T22:12:20+00:00","breadcrumb":{"@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#primaryimage","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/82d61016-c99c-4b6d-9c12-26e8da1fdb20.jpg","width":1536,"height":1024,"caption":"Image Generated by Author via DALL-E"},{"@type":"BreadcrumbList","@id":"https:\/\/towardsdatascience.com\/dynamic-inventory-optimization-with-censored-demand-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/towardsdatascience.com\/"},{"@type":"ListItem","position":2,"name":"Dynamic Inventory Optimization with Censored Demand"}]},{"@type":"WebSite","@id":"https:\/\/towardsdatascience.com\/#website","url":"https:\/\/towardsdatascience.com\/","name":"Towards Data Science","description":"Publish AI, ML &amp; data-science insights to a global community of data professionals.","publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"alternateName":"TDS","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/towardsdatascience.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/towardsdatascience.com\/#organization","name":"Towards Data Science","alternateName":"TDS","url":"https:\/\/towardsdatascience.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","width":696,"height":696,"caption":"Towards Data Science"},"image":{"@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/TDataScience","https:\/\/www.youtube.com\/c\/TowardsDataScience","https:\/\/www.linkedin.com\/company\/towards-data-science\/"]},{"@type":"Person","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee","name":"TDS Editors","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac","url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","caption":"TDS Editors"},"description":"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds","url":"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/"}]}},"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"TDS Contributor Portal","distributor_original_site_url":"https:\/\/contributor.insightmediagroup.io","push-errors":false,"_links":{"self":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/606577","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/comments?post=606577"}],"version-history":[{"count":0,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/606577\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media\/606578"}],"wp:attachment":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media?parent=606577"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/categories?post=606577"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/tags?post=606577"},{"taxonomy":"sponsor","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/sponsor?post=606577"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/coauthors?post=606577"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}