فترة الثقة لنسبة ثنائية الحدود

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trials). In other words, a binomial proportion confidence interval is an interval estimate of a success probability p when only the number of experiments n and the number of successes n_S are known.

There are several formulas for a binomial confidence interval, but all of them rely on the assumption of a binomial distribution. In general, a binomial distribution applies when an experiment is repeated a fixed number of times, each trial of the experiment has two possible outcomes (success and failure), the probability of success is the same for each trial, and the trials are statistically independent. Because the binomial distribution is a discrete probability distribution (i.e., not continuous) and difficult to calculate for large numbers of trials, a variety of approximations are used to calculate this confidence interval, all with their own tradeoffs in accuracy and computational intensity.

A simple example of a binomial distribution is the set of various possible outcomes, and their probabilities, for the number of heads observed when a coin is flipped ten times. The observed binomial proportion is the fraction of the flips that turn out to be heads. Given this observed proportion, the confidence interval for the true probability of the coin landing on heads is a range of possible proportions, which may or may not contain the true proportion. A 95% confidence interval for the proportion, for instance, will contain the true proportion 95% of the times that the procedure for constructing the confidence interval is employed.^[1]

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Normal approximation interval

A commonly used formula for a binomial confidence interval relies on approximating the distribution of error about a binomially-distributed observation, ${\hat {p}}$ , with a normal distribution.^[2] This approximation is based on the central limit theorem and is unreliable when the sample size is small or the success probability is close to 0 or 1.^[3]

Using the normal approximation, the success probability p is estimated as

{\hat {p}}\pm z{\sqrt {\frac {{\hat {p}}\left(1-{\hat {p}}\right)}{n}}},

or the equivalent

{\frac {n_{S}}{n}}\pm {\frac {z}{n{\sqrt {n}}}}{\sqrt {n_{S}n_{F}}},

where ${\hat {p}}=n_{S}/n$ is the proportion of successes in a Bernoulli trial process, measured with $n$ trials yielding $n_{S}$ successes and $n_{F}=n-n_{S}$ failures, and $z$ is the $1-{\tfrac {\alpha }{2}}$ quantile of a standard normal distribution (i.e., the probit) corresponding to the target error rate $\alpha$ . For a 95% confidence level, the error $\alpha =1-0.95=0.05$ , so $1-{\tfrac {\alpha }{2}}=0.975$ and $z=1.96$ .

An important theoretical derivation of this confidence interval involves the inversion of a hypothesis test. Under this formulation, the confidence interval represents those values of the population parameter that would have large p-values if they were tested as a hypothesized population proportion. The collection of values, $\theta$ , for which the normal approximation is valid can be represented as

\left\{\theta \,\,{\bigg |}\,\,y\leq {\frac {{\hat {p}}-\theta }{\sqrt {{\frac {1}{n}}{\hat {p}}\left(1-{\hat {p}}\right)}}}\leq z_{\tfrac {\alpha }{2}}\right\},

where $y$ is the ${\tfrac {\alpha }{2}}$ quantile of a standard normal distribution. Since the test in the middle of the inequality is a Wald test, the normal approximation interval is sometimes called the Wald interval, but it was first described by Pierre-Simon Laplace in 1812.^[4]

فترة نتيجة ولسون

The Wilson score interval is an improvement over the normal approximation interval in that the actual coverage probability is closer to the nominal value. It was developed by Edwin Bidwell Wilson (1927).^[5]

Wilson started with the normal approximation to the binomial:

z\approx {\frac {~\left(\,p-{\hat {p}}\,\right)~}{\sigma _{n}}}

with the analytic formula for the sample standard deviation given by

\sigma _{n}={\sqrt {\,{\frac {\,p\left(1-p\right)\,}{n}}~}}~

.

Combining the two, and squaring out the radical, gives an equation that is quadratic in $p$ :

\left(\,{\hat {p}}-p\,\right)^{2}=z^{2}\cdot {\frac {\,p\left(1-p\right)\,}{n}}

Transforming the relation into a standard-form quadratic equation for $p$ , treating ${\hat {p}}$ and $n$ as known values from the sample (see prior section), and using the value of $z$ that corresponds to the desired confidence for the estimate of $p$ gives this:

{\biggl (}1+{\frac {\,z^{2}\,}{n}}{\biggr )}\,p^{2}+{\biggl (}-2{\hat {p}}-{\frac {\,z^{2}\,}{n}}{\biggr )}\,p+{\biggl (}{\hat {p}}^{2}{\biggr )}=0~

,

where all of the values in parentheses are known quantities. The solution for $p$ estimates the upper and lower limits of the confidence interval for $p$ . Hence the probability of success $p$ is estimated by

{\frac {1}{~1+{\frac {\,z^{2}\,}{n}}~}}\left({\hat {p}}+{\frac {\,z^{2}\,}{2n}}\right)\pm {\frac {z}{~1+{\frac {z^{2}}{n}}~}}{\sqrt {{\frac {\,{\hat {p}}(1-{\hat {p}})\,}{n}}+{\frac {\,z^{2}\,}{4n^{2}}}~}}

or the equivalent

{\frac {~n_{S}+{\tfrac {1}{2}}z^{2}~}{n+z^{2}}}\pm {\frac {z}{n+z^{2}}}{\sqrt {{\frac {~n_{S}\,n_{F}~}{n}}+{\frac {z^{2}}{4}}~}}~.

The practical observation from using this interval, is that it has good properties even for a small number of trials and / or an extreme probability.

Intuitively, the center value of this interval is the weighted average of ${\hat {p}}$ and ${\tfrac {1}{2}}$ , with ${\hat {p}}$ receiving greater weight as the sample size increases. Formally, the center value corresponds to using a pseudocount of $.mw-parser-output .sfrac{white-space:nowrap}.mw-parser-output .sfrac.tion,.mw-parser-output .sfrac .tion{display:inline-block;vertical-align:-0.5em;font-size:85%;text-align:center}.mw-parser-output .sfrac .num,.mw-parser-output .sfrac .den{display:block;line-height:1em;margin:0 0.1em}.mw-parser-output .sfrac .den{border-top:1px solid}.mw-parser-output .sr-only{border:0;clip:rect(0,0,0,0);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}1/2 z²$ , the number of standard deviations of the confidence interval: add this number to both the count of successes and of failures to yield the estimate of the ratio. For the common two standard deviations in each direction interval (approximately 95% coverage, which itself is approximately 1.96 standard deviations), this yields the estimate $(n_{S}+2)/(n+4)$ , which is known as the "plus four rule".

Although the quadratic can be solved explicitly, in most cases Wilson's equations can also be solved numerically using the fixed-point iteration

p_{k+1}={\hat {p}}\pm z\cdot {\sqrt {\frac {p_{k}\cdot \left(1-p_{k}\right)}{n}}}

with $p_{0}={\hat {p}}$ .

The Wilson interval can be derived from Pearson's chi-squared test with two categories. The resulting interval,

\left\{\theta \,\,{\bigg |}\,\,y\leq {\frac {{\hat {p}}-\theta }{\sqrt {{\tfrac {1}{n}}\theta (1-\theta )}}}\leq z\right\},

can then be solved for $\theta$ to produce the Wilson score interval. The test in the middle of the inequality is a score test.

References

^ Sullivan, Lisa (2017-10-27). "Confidence Intervals". Boston University School of Public Health.
^ Wallis, Sean A. (2013). "Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods" (PDF). Journal of Quantitative Linguistics. 20 (3): 178–208. doi:10.1080/09296174.2013.799918. S2CID 16741749.
^ Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001). "Interval Estimation for a Binomial Proportion". Statistical Science. 16 (2): 101–133. CiteSeerX 10.1.1.50.3025. doi:10.1214/ss/1009213286. MR 1861069. Zbl 1059.62533.
^ Laplace, Pierre Simon (1812). Théorie analytique des probabilités (in الفرنسية). Ve. Courcier. p. 283.
^ Wilson, E. B. (1927). "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association. 22 (158): 209–212. doi:10.1080/01621459.1927.10502953. JSTOR 2276774.

خطأ استشهاد: الوسم <ref> ذو الاسم "New" المُعرّف في <references> غير مستخدم في النص السابق.
خطأ استشهاد: الوسم <ref> ذو الاسم "Rei" المُعرّف في <references> غير مستخدم في النص السابق.
خطأ استشهاد: الوسم <ref> ذو الاسم "SL" المُعرّف في <references> غير مستخدم في النص السابق.

خطأ استشهاد: الوسم <ref> ذو الاسم "Ross" المُعرّف في <references> غير مستخدم في النص السابق.

[Sullivan-1] Sullivan, Lisa (2017-10-27). "Confidence Intervals". Boston University School of Public Health.

[Wallis2013-2] Wallis, Sean A. (2013). "Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods" (PDF). Journal of Quantitative Linguistics. 20 (3): 178–208. doi:10.1080/09296174.2013.799918. S2CID 16741749.

[Brown2001-3] Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001). "Interval Estimation for a Binomial Proportion". Statistical Science. 16 (2): 101–133. CiteSeerX 10.1.1.50.3025. doi:10.1214/ss/1009213286. MR 1861069. Zbl 1059.62533.

[4] Laplace, Pierre Simon (1812). Théorie analytique des probabilités (in الفرنسية). Ve. Courcier. p. 283.

[Wilson1927-5] Wilson, E. B. (1927). "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association. 22 (158): 209–212. doi:10.1080/01621459.1927.10502953. JSTOR 2276774.

[1]

[2]

[3]

[4]

[5]



وبينار	مصادر	صور	الأخبار	فيديو

فترة الثقة لنسبة ثنائية الحدود

فهرست

Normal approximation interval

فترة نتيجة ولسون

See also

References