Fit probability distribution object to data (2024)

Fit probability distribution object to data

collapse all in page

Syntax

pd = fitdist(x,distname)

pd = fitdist(x,distname,Name,Value)

[pdca,gn,gl]= fitdist(x,distname,'By',groupvar)

[pdca,gn,gl]= fitdist(x,distname,'By',groupvar,Name,Value)

Description

example

pd = fitdist(x,distname) createsa probability distribution object by fitting the distribution specifiedby distname to the data in column vector x.

example

pd = fitdist(x,distname,Name,Value) createsthe probability distribution object with additional options specifiedby one or more name-value pair arguments. For example, you can indicatecensored data or specify control parameters for the iterative fittingalgorithm.

example

[pdca,gn,gl]= fitdist(x,distname,'By',groupvar) createsprobability distribution objects by fitting the distribution specifiedby distname to the data in x basedon the grouping variable groupvar. It returnsa cell array of fitted probability distribution objects, pdca,a cell array of group labels, gn, and a cellarray of grouping variable levels, gl.

example

[pdca,gn,gl]= fitdist(x,distname,'By',groupvar,Name,Value) returnsthe above output arguments using additional options specified by oneor more name-value pair arguments. For example, you can indicate censoreddata or specify control parameters for the iterative fitting algorithm.

Examples

collapse all

Fit Normal Distribution to Data

Open Live Script

Fit a normal distribution to sample data, and examine the fit by using a histogram and a quantile-quantile plot.

Load patient weights from the data file patients.mat.

load patientsx = Weight;

Create a normal distribution object by fitting it to the data.

pd = fitdist(x,'Normal')

pd = NormalDistribution Normal distribution mu = 154 [148.728, 159.272] sigma = 26.5714 [23.3299, 30.8674]

The distribution object display includes the parameter estimates for the mean (mu) and standard deviation (sigma), and the 95% confidence intervals for the parameters.

You can use the object functions of pd to evaluate the distribution and generate random numbers. Display the supported object functions.

methods(pd)

Methods for class prob.NormalDistribution:cdf gather icdf iqr mean median negloglik paramci pdf plot proflik random std truncate var

For example, obtain the 95% confidence intervals by using the paramci function.

ci95 = paramci(pd)

ci95 = 2×2 148.7277 23.3299 159.2723 30.8674

Specify the significance level (Alpha) to obtain confidence intervals with a different confidence level. Compute the 99% confidence intervals.

ci99 = paramci(pd,'Alpha',.01)

ci99 = 2×2 147.0213 22.4257 160.9787 32.4182

Evaluate and plot the pdf values of the distribution.

x_values = 50:1:250;y = pdf(pd,x_values);plot(x_values,y)

Create a histogram with the normal distribution fit by using the histfit function. histfit uses fitdist to fit a distribution to data.

histfit(x)

The histogram shows that the data has two modes, and that the mode of the normal distribution fit is between those two modes.

Use qqplot to create a quantile-quantile plot of the quantiles of the sample data x versus the theoretical quantile values of the fitted distribution.

qqplot(x,pd)

The plot is not a straight line, suggesting that the data does not follow a normal distribution.

Fit Kernel Distribution to Data

Open Live Script

Load patient weights from the data file patients.mat.

Fit Normal Distributions to Grouped Data

Open Live Script

Load patient weights and genders from the data file patients.mat.

load patientsx = Weight;

Create normal distribution objects by fitting them to the data, grouped by patient gender.

[pdca,gn,gl] = fitdist(x,'Normal','By',Gender)

pdca=1×2 cell array {1x1 prob.NormalDistribution} {1x1 prob.NormalDistribution}

gn = 2x1 cell {'Male' } {'Female'}

gl = 2x1 cell {'Male' } {'Female'}

The cell array pdca contains two probability distribution objects, one for each gender group. The cell array gn contains two group labels. The cell array gl contains two group levels.

View each distribution in the cell array pdca to compare the mean, mu, and the standard deviation, sigma, grouped by patient gender.

female = pdca{1} % Distribution for females

female = NormalDistribution Normal distribution mu = 180.532 [177.833, 183.231] sigma = 9.19322 [7.63933, 11.5466]

male = pdca{2} % Distribution for males

male = NormalDistribution Normal distribution mu = 130.472 [128.183, 132.76] sigma = 8.30339 [6.96947, 10.2736]

Compute the pdf of each distribution.

x_values = 50:1:250;femalepdf = pdf(female,x_values);malepdf = pdf(male,x_values);

Plot the pdfs for a visual comparison of weight distribution by gender.

figureplot(x_values,femalepdf,'LineWidth',2)hold onplot(x_values,malepdf,'Color','r','LineStyle',':','LineWidth',2)legend(gn,'Location','NorthEast')hold off

Fit Kernel Distributions to Grouped Data

Open Live Script

Load patient weights and genders from the data file patients.mat.

load patientsx = Weight;

Create kernel distribution objects by fitting them to the data, grouped by patient gender. Use a triangular kernel function.

[pdca,gn,gl] = fitdist(x,'Kernel','By',Gender,'Kernel','triangle');

View each distribution in the cell array pdca to see the kernel distributions for each gender.

female = pdca{1} % Distribution for females

female = KernelDistribution Kernel = triangle Bandwidth = 5.08961 Support = unbounded

male = pdca{2} % Distribution for males

male = KernelDistribution Kernel = triangle Bandwidth = 4.25894 Support = unbounded

Compute the pdf of each distribution.

x_values = 50:1:250;femalepdf = pdf(female,x_values);malepdf = pdf(male,x_values);

Plot the pdfs for a visual comparison of weight distribution by gender.

figureplot(x_values,femalepdf,'LineWidth',2)hold onplot(x_values,malepdf,'Color','r','LineStyle',':','LineWidth',2)legend(gn,'Location','NorthEast')hold off

Input Arguments

collapse all

`x` — Input data
column vector

Input data, specified as a column vector. fitdist ignores NaN values in x. Additionally, any NaN values in the censoring vector or frequency vector cause fitdist to ignore the corresponding values in x.

Data Types: double

`distname` — Distribution name
character vector | string scalar

Distribution name, specified as one of the following character vectors or string scalars. The distribution specified by distname determines the type of the returned probability distribution object.

Distribution Name	Description	Distribution Object
`'Beta'`	Beta distribution	BetaDistribution
`'Binomial'`	Binomial distribution	BinomialDistribution
`'BirnbaumSaunders'`	Birnbaum-Saunders distribution	BirnbaumSaundersDistribution
`'Burr'`	Burr distribution	BurrDistribution
`'Exponential'`	Exponential distribution	ExponentialDistribution
`'Extreme Value'` or `'ev'`	Extreme Value distribution	ExtremeValueDistribution
`'Gamma'`	Gamma distribution	GammaDistribution
`'Generalized Extreme Value'` or `'gev'`	Generalized Extreme Value distribution	GeneralizedExtremeValueDistribution
`'Generalized Pareto'` or `'gp'`	Generalized Pareto distribution	GeneralizedParetoDistribution
`'Half Normal'` or `'hn'`	Half-normal distribution	HalfNormalDistribution
`'InverseGaussian'`	Inverse Gaussian distribution	InverseGaussianDistribution
`'Kernel'`	Kernel distribution	KernelDistribution
`'Logistic'`	Logistic distribution	LogisticDistribution
`'Loglogistic'`	Loglogistic distribution	LoglogisticDistribution
`'Lognormal'`	Lognormal distribution	LognormalDistribution
`'Nakagami'`	Nakagami distribution	NakagamiDistribution
`'Negative Binomial'` or `'nbin'`	Negative Binomial distribution	NegativeBinomialDistribution
`'Normal'`	Normal distribution	NormalDistribution
`'Poisson'`	Poisson distribution	PoissonDistribution
`'Rayleigh'`	Rayleigh distribution	RayleighDistribution
`'Rician'`	Rician distribution	RicianDistribution
`'Stable'`	Stable distribution	StableDistribution
`'tLocationScale'`	t Location-Scale distribution	tLocationScaleDistribution
`'Weibull'` or `'wbl'`	Weibull distribution	WeibullDistribution

`groupvar` — Grouping variable
categorical array | logical or numeric vector | character array | string array | cell array of character vectors

Grouping variable, specified as a categorical array, logical or numeric vector, character array, string array, or cell array of character vectors. Each unique value in a grouping variable defines a group.

For example, if Gender is a cell array ofcharacter vectors with values 'Male' and 'Female',you can use Gender as a grouping variable to fita distribution to your data by gender.

More than one grouping variable can be used by specifying a cell array of grouping variables. Observations are placed in the same group if they have common values of all specified grouping variables.

For example, if Smoker is a logical vectorwith values 0 for nonsmokers and 1 forsmokers, then specifying the cell array {Gender,Smoker} dividesobservations into four groups: Male Smoker, Male Nonsmoker, FemaleSmoker, and Female Nonsmoker.

Example: {Gender,Smoker}

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: fitdist(x,'Kernel','Kernel','triangle') fitsa kernel distribution object to the data in x usinga triangular kernel function.

`Censoring` — Logical flag for censored data
`0` (default) | vector of logical values

Logical flag for censored data, specified as a vector of logical values that is the same size as input vector x. The value is 1 when the corresponding element in x is a right-censored observation and 0 when the corresponding element is an exact observation. The default is a vector of 0s, indicating that all observations are exact.

fitdist ignores any NaN values in this censoring vector. Additionally, any NaN values in x or the frequency vector cause fitdist to ignore the corresponding values in the censoring vector.

This argument is valid only if distname is 'BirnbaumSaunders', 'Burr', 'Exponential', 'ExtremeValue', 'Gamma', 'InverseGaussian', 'Kernel', 'Logistic', 'Loglogistic', 'Lognormal', 'Nakagami', 'Normal', 'Rician', 'tLocationScale', or 'Weibull'.

Data Types: logical

`Frequency` — Observation frequency
`1` (default) | vector of nonnegative integer values

Observation frequency, specified as a vector of nonnegative integer values that is the same size as input vector x. Each element of the frequency vector specifies the frequencies for the corresponding elements in x. The default is a vector of 1s, indicating that each value in x only appears once.

fitdist ignores any NaN values in this frequency vector. Additionally, any NaN values in x or the censoring vector cause fitdist to ignore the corresponding values in the frequency vector.

Data Types: single | double

`Options` — Control parameters
structure

Control parameters for the iterative fitting algorithm, specified as a structure you create using statset.

Data Types: struct

`NTrials` — Number of trials for binomial distribution
1 (default) | positive integer value

Number of trials for the binomial distribution, specified as a positive integer value.

This argument is valid only when distname is 'Binomial' (binomial distribution).

Example: 'Ntrials',10

Data Types: single | double

`theta` — Location (threshold) parameter for generalized Pareto distribution
scalar value

Location (threshold) parameter for the generalized Pareto distribution, specified as a scalar.

This argument is valid only when distname is 'Generalized Pareto' (generalized Pareto distribution).

The default value is 0 when the sample data x includes only nonnegative values. You must specify theta if x includes negative values.

Example: 'theta',1

Data Types: single | double

`mu` — Location parameter for half-normal distribution
scalar value

Location parameter for the half-normal distribution, specified as a scalar.

This argument is valid only when distname is 'Half Normal' (half-normal distribution).

The default value is 0 when the sample data x includes only nonnegative values. You must specify mu if x includes negative values.

Example: 'mu',1

Data Types: single | double

`Kernel` — Kernel smoother type for kernel distribution
`'normal'` (default) | `'box'` | `'triangle'` | `'epanechnikov'`

Kernel smoother type for the kernel distribution, specified as one of the following:

'normal'
'box'
'triangle'
'epanechnikov'

You must specify distname as 'Kernel' to use this option.

`Support` — Kernel density support for kernel distribution
`'unbounded'` (default) | `'positive'` | two-element vector

Kernel density support for the kernel distribution, specified as 'unbounded', 'positive', or a two-element vector.

Value	Description
`'unbounded'`	Density can extend over the whole real line.
`'positive'`	Density is restricted to positive values.

Alternatively, you can specify a two-element vector giving finite lower and upper limits for the support of the density.

You must specify distname as 'Kernel' to use this option.

Data Types: single | double | char | string

`Width` — Bandwidth of kernel smoothing window for kernel distribution
scalar value

Bandwidth of the kernel smoothing window for the kernel distribution, specified as a scalar value. The default value used by fitdist is optimal for estimating normal densities, but you might want to choose a smaller value to reveal features such as multiple modes. You must specify distname as 'Kernel' to use this option.

Data Types: single | double

Output Arguments

collapse all

`pd` — Probability distribution
probability distribution object

Probability distribution, returned as a probability distribution object. The distribution specified by distname determines the class type of the returned probability distribution object. For the list of distname values and corresponding probability distribution objects, see distname.

`pdca` — Probability distribution objects
cell array

Probability distribution objects of the type specified by distname, returned as a cell array. For the list of distname values and corresponding probability distribution objects, see distname.

`gn` — Group labels
cell array of character vectors

Group labels, returned as a cell array of character vectors.

`gl` — Grouping variable levels
cell array of character vectors

Grouping variable levels, returned as a cell array of charactervectors containing one column for each grouping variable.

Algorithms

The fitdist function fits most distributionsusing maximum likelihood estimation. Two exceptions are the normaland lognormal distributions with uncensored data.

For the uncensored normal distribution, the estimatedvalue of the sigma parameter is the square root of the unbiased estimateof the variance.
For the uncensored lognormal distribution, the estimatedvalue of the sigma parameter is the square root of the unbiased estimateof the variance of the log of the data.

Alternative Functionality

The Distribution Fitter app opens a graphical user interface for you to import data from the workspace and interactively fit a probability distribution to that data. You can then save the distribution to the workspace as a probability distribution object. Open the Distribution Fitter app using distributionFitter, or click Distribution Fitter on the Apps tab.
To fit a distribution to left-censored, double-censored, or interval-censored data, use mle. You can find the maximum likelihood estimates by using the mle function, and create a probability distribution object by using the makedist function. For an example, see Find MLEs for Double-Censored Data.

References

[1] Johnson, N. L., S. Kotz, and N. Balakrishnan. ContinuousUnivariate Distributions. Vol. 1, Hoboken, NJ: Wiley-Interscience,1993.

[2] Johnson, N. L., S. Kotz, and N. Balakrishnan. ContinuousUnivariate Distributions. Vol. 2, Hoboken, NJ: Wiley-Interscience,1994.

[3] Bowman, A. W., and A. Azzalini. AppliedSmoothing Techniques for Data Analysis. New York: OxfordUniversity Press, 1997.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Supported syntaxes are:
```
pd = fitdist(x,distname)pd = fitdist(x,distname,Name,Value)
```
Code generation does not support the syntaxes that include the grouping variable 'By',groupvar and the related output arguments pdca, gn, and gl.
fitdist supports code generation for beta, exponential, extreme value, lognormal, normal, and Weibull distributions.
- The value of distname can be 'Beta', 'Exponential', 'ExtremeValue', 'Lognormal', 'Normal' or 'Weibull'.
- The value of distname must be a compile-time constant.
The values of x, 'Censoring', and 'Frequency' must not contain NaN values.
Code generation ignores the 'Frequency' value for the beta distribution. Instead of specifying the 'Frequency' value, manually add duplicated values to x so that the values in x have the frequency you want.
Code generation does not support these input arguments: groupvar, NTrials, Theta, mu, Kernel, Support, and Width.
Names in name-value pair arguments must be compile-time constants.
These object functions of pd support code generation: cdf, icdf, iqr, mean, median, pdf, std, truncate, and var.

For more information on code generation, see Introduction to Code Generation and Code Generation for Probability Distribution Objects.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

You cannot specify the input argument distname as 'Rician' or 'Stable'.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2009a

MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

Americas

América Latina (Español)
Canada (English)
United States (English)

Europe

Belgium (English)
Denmark (English)
Deutschland (Deutsch)
España (Español)
Finland (English)
France (Français)
Ireland (English)
Italia (Italiano)
Luxembourg (English)

Netherlands (English)
Norway (English)
Österreich (Deutsch)
Portugal (English)
Sweden (English)
Switzerland
- Deutsch
- English
- Français
United Kingdom (English)

Asia Pacific

Australia (English)
India (English)
New Zealand (English)
中国
- 简体中文
- English
日本 (日本語)
한국 (한국어)

Contact your local office

Fit probability distribution object to data (2024)

Syntax

Description

Examples

Fit Normal Distribution to Data

Fit Kernel Distribution to Data

Fit Normal Distributions to Grouped Data

Fit Kernel Distributions to Grouped Data

Input Arguments

x — Input data column vector

distname — Distribution name character vector | string scalar

groupvar — Grouping variable categorical array | logical or numeric vector | character array | string array | cell array of character vectors

Name-Value Arguments

Censoring — Logical flag for censored data 0 (default) | vector of logical values

Frequency — Observation frequency 1 (default) | vector of nonnegative integer values

Options — Control parameters structure

NTrials — Number of trials for binomial distribution 1 (default) | positive integer value

theta — Location (threshold) parameter for generalized Pareto distributionscalar value

mu — Location parameter for half-normal distribution scalar value

Kernel — Kernel smoother type for kernel distribution 'normal' (default) | 'box' | 'triangle' | 'epanechnikov'

Support — Kernel density support for kernel distribution 'unbounded' (default) | 'positive' | two-element vector

Width — Bandwidth of kernel smoothing window for kernel distribution scalar value

Output Arguments

pd — Probability distribution probability distribution object

pdca — Probability distribution objects cell array

gn — Group labels cell array of character vectors

gl — Grouping variable levels cell array of character vectors

Algorithms

Alternative Functionality

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

MATLAB Command

Americas

Europe

Asia Pacific

References

`x` — Input data
column vector

`distname` — Distribution name
character vector | string scalar

`groupvar` — Grouping variable
categorical array | logical or numeric vector | character array | string array | cell array of character vectors

`Censoring` — Logical flag for censored data
`0` (default) | vector of logical values

`Frequency` — Observation frequency
`1` (default) | vector of nonnegative integer values

`Options` — Control parameters
structure

`NTrials` — Number of trials for binomial distribution
1 (default) | positive integer value

`theta` — Location (threshold) parameter for generalized Pareto distribution
scalar value

`mu` — Location parameter for half-normal distribution
scalar value

`Kernel` — Kernel smoother type for kernel distribution
`'normal'` (default) | `'box'` | `'triangle'` | `'epanechnikov'`

`Support` — Kernel density support for kernel distribution
`'unbounded'` (default) | `'positive'` | two-element vector

`Width` — Bandwidth of kernel smoothing window for kernel distribution
scalar value

`pd` — Probability distribution
probability distribution object

`pdca` — Probability distribution objects
cell array

`gn` — Group labels
cell array of character vectors

`gl` — Grouping variable levels
cell array of character vectors

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.