Statistical significance: definition, concept, significance, regression equations and hypothesis testing. Level of statistical significance

Statistics have long become an integral part of life. People encounter it everywhere. Based on statistics, conclusions are drawn about where and what diseases are common, what is more in demand in a particular region or among a certain segment of the population. Even the political programs of candidates for government are based on this. They are also used by retail chains when purchasing goods, and manufacturers are guided by these data in their offers.

Statistics plays an important role in the life of society and affects each individual member even in small things. For example, if by , most people prefer dark colors in clothing in a particular city or region, it will be extremely difficult to find a bright yellow raincoat with a floral print in local retail outlets. But what quantities make up these data that have such an impact? For example, what constitutes “statistical significance”? What exactly is meant by this definition?

What is this?

Statistics as a science consists of a combination of different quantities and concepts. One of them is the concept of “statistical significance”. This is the name of the value variables, the likelihood of other indicators appearing in which is negligible.

For example, 9 out of 10 people put rubber shoes on their feet during a morning walk to pick mushrooms in the autumn forest after a rainy night. The likelihood that at some point 8 of them will be wearing canvas moccasins is negligible. Thus, in this specific example The number 9 is a value that is called “statistical significance.”

Accordingly, if we develop further the given practical example, shoe stores are stocking towards the end summer season rubber boots in greater numbers than at other times of the year. Thus, the magnitude of a statistical value has an impact on ordinary life.

Of course, in complex calculations, say, when forecasting the spread of viruses, a large number of variables are taken into account. But the very essence of determining a significant indicator of statistical data is similar, regardless of the complexity of the calculations and the number of variable values.

How is it calculated?

They are used when calculating the value of the “statistical significance” indicator of the equation. That is, it can be argued that in this case everything is decided by mathematics. The most simple option calculation is a chain of mathematical operations in which following parameters:

  • two types of results obtained from surveys or the study of objective data, for example, the amounts for which purchases are made, denoted a and b;
  • indicator for both groups - n;
  • the value of the share of the combined sample is p;
  • the concept of “standard error” - SE.

The next step is to determine the general test indicator - t, its value is compared with the number 1.96. 1.96 is the average value representing the 95% range according to the Student's t-distribution function.

The question often arises about what is the difference between the values ​​of n and p. This nuance can be easily clarified with the help of an example. Let’s say we are calculating the statistical significance of loyalty to a product or brand for men and women.

In this case, the letter designations will be followed by the following:

  • n - number of respondents;
  • p - the number of people satisfied with the product.

The number of women surveyed in this case will be designated as n1. Accordingly, there are n2 men. The numbers “1” and “2” for the symbol p will have the same meaning.

Comparison of the test indicator with the average values ​​of Student's calculation tables becomes what is called “statistical significance”.

What is meant by verification?

The results of any mathematical calculation can always be checked; children are taught this as early as primary school. It is logical to assume that since statistical indicators are determined using a chain of calculations, they are checked.

However, testing statistical significance is not just about math. Statistics deals with big amount variables and different probabilities, which are not always calculable. That is, if we return to the example with rubber shoes given at the beginning of the article, then the logical construction of statistical data on which buyers of goods for stores will rely may be disrupted by dry and hot weather, which is not typical for autumn. As a result of this phenomenon, the number of people purchasing rubber boots will decrease, and retail outlets will suffer losses. Anticipate a weather anomaly mathematical formula, of course, not able to. This moment is called “mistake”.

It is precisely the probability of such errors that is taken into account when checking the level of calculated significance. It takes into account both calculated indicators and accepted significance levels, as well as values ​​conventionally called hypotheses.

What is the significance level?

The concept of “level” is included in the main criteria for statistical significance. It is used in applied and practical statistics. This is a kind of value that takes into account the likelihood of possible deviations or errors.

The level is based on identifying differences in ready-made samples and allows us to establish their significance or, conversely, randomness. This concept has not only digital meanings, but also their unique decodings. They explain how the value needs to be understood, and the level itself is determined by comparing the result with the average index, this reveals the degree of reliability of the differences.

Thus, we can imagine the concept of level simply - it is an indicator of an acceptable, probable error or error in the conclusions drawn from the statistical data obtained.

What significance levels are used?

Statistical significance In practice, the coefficients of the probability of an error are based on three basic levels.

The first level is considered to be the threshold at which the value is 5%. That is, the probability of error does not exceed the significance level of 5%. This means that the confidence in the impeccability and error-free conclusions made on the basis of statistical research data is 95%.

The second level is the 1% threshold. Accordingly, this figure means that one can be guided by the data obtained during statistical calculations with 99% confidence.

The third level is 0.1%. With this value, the probability of an error is equal to a fraction of a percent, that is, errors are practically eliminated.

What is a hypothesis in statistics?

Errors as a concept are divided into two directions, concerning the acceptance or rejection of the null hypothesis. A hypothesis is a concept behind which, according to definition, lies a set of other data or statements. That is, a description of the probabilistic distribution of something related to the subject of statistical accounting.

There are two hypotheses in simple calculations - zero and alternative. The difference between them is that the null hypothesis is based on the idea that there is no fundamental differences between the samples involved in determining statistical significance, and the alternative is completely opposite. That is, the alternative hypothesis is based on the presence of a significant difference in the sample data.

What are the errors?

Errors as a concept in statistics are directly dependent on the acceptance of one or another hypothesis as true. They can be divided into two directions or types:

  • the first type is due to the acceptance of the null hypothesis, which turns out to be false;
  • the second is caused by following the alternative.

The first type of error is called false positive and occurs quite often in all areas where statistical data is used. Accordingly, the error of the second type is called false negative.

What is regression used for in statistics?

The statistical significance of regression is that it can be used to determine how well the model calculated based on the data corresponds to reality. various dependencies; allows you to identify the sufficiency or lack of factors to take into account and draw conclusions.

The regression value is determined by comparing the results with the data listed in the Fisher tables. Or using analysis of variance. Important regression indicators have complex statistical research and calculations that involve a large number of variables, random data and probable changes.

The concept of statistical significance

Statistical reliability is essential in the FCC's calculation practice. It was noted earlier that multiple samples can be selected from the same population:

If they are selected correctly, then their average indicators and the indicators of the general population differ slightly from each other in the magnitude of the representativeness error, taking into account the accepted reliability;

If they are selected from different populations, the difference between them turns out to be significant. Statistics is all about comparing samples;

If they differ insignificantly, unprincipally, insignificantly, that is, they actually belong to the same general population, the difference between them is called statistically unreliable.

Statistically reliable A sample difference is a sample that differs significantly and fundamentally, that is, it belongs to different general populations.

At the FCC, assessing the statistical significance of sample differences means solving many practical problems. For example, the introduction of new teaching methods, programs, sets of exercises, tests, control exercises is associated with their experimental testing, which should show that the test group is fundamentally different from the control group. Therefore, special statistical methods are used, called criteria of statistical reliability, allowing to detect the presence or absence of a statistically significant difference between samples.

All criteria are divided into two groups: parametric and non-parametric. Parametric criteria provide for the mandatory presence of a normal distribution law, i.e. this means the mandatory determination of the main indicators of the normal law - the arithmetic mean X and standard deviation o. Parametric criteria are the most accurate and correct. Nonparametric tests based on rank (ordinal) differences between sample elements.

Here are the main criteria for statistical significance used in FCC practice: Student's t-test, Fisher's test, Wilcoxon test, White's test, Van der Waerden's test (sign test).

Student's t test named after the English scientist K. Gosset (Student - pseudonym), who discovered this method. Student's t test is parametric, used to compare absolute indicators of samples. Samples may vary in size.

The Student's t-test is defined as follows.

1. Find the Student's t-test t according to the following formula:

Where Xi, x 2 - arithmetic means of compared samples; /i ь w 2 - errors of representativeness identified on the basis of the indicators of the compared samples.

2. Practice at the FCC has shown that for sports work it is enough to accept the reliability of the account R= 0,95.

63 For account reliability: P= 0.95 (a = 0.05), with the number of degrees; freedom k= «! + p 2 - 2, using the table in Appendix 4, we find the value \ well, the limit value of the criterion (^gr).

3. Based on the properties of the normal distribution law, a comparison is made in the Student’s test t And t^.

4. We draw conclusions:

If t> ftp, then the difference between the compared samples is statistically significant;

If t< 7 F, then the difference is statistically insignificant.

For researchers in the field of FCS, assessing statistical significance is the first step in solving a specific problem: whether there is a fundamental or non-fundamental difference between; are compared samples. The next step is; assessment of this difference from a pedagogical point of view, which is determined by the conditions of the task.

The level of significance in statistics is an important indicator that reflects the degree of confidence in the accuracy and truth of the obtained (predicted) data. The concept is widely used in various fields: from conducting sociological research to statistical testing of scientific hypotheses.

Definition

The level of statistical significance (or statistically significant result) shows the probability of the occurrence of the studied indicators by chance. The overall statistical significance of a phenomenon is expressed by the p-value coefficient (p-level). In any experiment or observation, there is a possibility that the data obtained were due to sampling errors. This is especially true for sociology.

That is, a statistically significant value is a value whose probability of random occurrence is extremely small or tends to the extreme. The extreme in this context is the degree to which statistics deviate from the null hypothesis (a hypothesis that is tested for consistency with the obtained sample data). In scientific practice, the significance level is selected before data collection and, as a rule, its coefficient is 0.05 (5%). For systems where precise values ​​are extremely important, this figure may be 0.01 (1%) or less.

Background

The concept of significance level was introduced by the British statistician and geneticist Ronald Fisher in 1925, when he was developing a technique for testing statistical hypotheses. When analyzing any process, there is a certain probability of certain phenomena. Difficulties arise when working with small (or not obvious) percentages of probabilities that fall under the concept of “measurement error.”

When working with statistical data that is not specific enough to test, scientists are faced with the problem of the null hypothesis, which “prevents” operating with small quantities. Fisher proposed for such systems to determine the probability of events at 5% (0.05) as a convenient sampling cut that allows one to reject the null hypothesis in calculations.

Introduction of fixed odds

In 1933 scientists Jerzy Neyman and Egon Pearson in their works recommended setting a certain level of significance in advance (before data collection). Examples of the use of these rules are clearly visible during elections. Let's say there are two candidates, one of whom is very popular and the other is little known. It is obvious that the first candidate will win the election, and the chances of the second tend to zero. They strive - but are not equal: there is always the possibility of force majeure, sensational information, unexpected decisions that can change the predicted election results.

Neyman and Pearson agreed that Fisher's significance level of 0.05 (denoted by α) was most appropriate. However, Fischer himself in 1956 opposed fixing this value. He believed that the level of α should be set according to specific circumstances. For example, in particle physics it is 0.01.

p-level value

The term p-value was first used by Brownlee in 1960. P-level (p-value) is an indicator that is in inverse relationship on the truth of the results. The highest p-value coefficient corresponds to the lowest level of confidence in the sampled relationship between variables.

This value reflects the likelihood of errors associated with the interpretation of the results. Let's assume p-level = 0.05 (1/20). It shows a five percent probability that the relationship between variables found in the sample is just a random feature of the sample. That is, if this dependence is absent, then with repeated similar experiments, on average, in every twentieth study one can expect the same or greater dependence between the variables. The p-level is often seen as a "margin" for the error rate.

By the way, p-value may not reflect the real relationship between variables, but only shows a certain average value within the assumptions. In particular, the final analysis of the data will also depend on the selected values ​​of this coefficient. At p-level = 0.05 there will be some results, and at a coefficient equal to 0.01 there will be different results.

Testing statistical hypotheses

The level of statistical significance is especially important when testing hypotheses. For example, when calculating a two-sided test, the rejection region is divided equally at both ends of the sampling distribution (relative to the zero coordinate) and the truth of the resulting data is calculated.

Suppose, when monitoring a certain process (phenomenon), it turns out that new statistical information indicates small changes relative to previous values. At the same time, the discrepancies in the results are small, not obvious, but important for the study. The specialist is faced with a dilemma: are changes really occurring or are these sampling errors (measurement inaccuracy)?

In this case, they use or reject the null hypothesis (attribute everything to an error, or recognize the change in the system as a fait accompli). The problem solving process is based on the ratio of overall statistical significance (p-value) and significance level (α). If p-level< α, значит, нулевую гипотезу отвергают. Чем меньше р-value, тем более значимой является тестовая статистика.

Values ​​used

The level of significance depends on the material being analyzed. In practice, the following fixed values ​​are used:

  • α = 0.1 (or 10%);
  • α = 0.05 (or 5%);
  • α = 0.01 (or 1%);
  • α = 0.001 (or 0.1%).

The more accurate the calculations are required, the lower the α coefficient is used. Naturally, statistical forecasts in physics, chemistry, pharmaceuticals, and genetics require greater accuracy than in political science and sociology.

Significance thresholds in specific areas

In high-precision fields such as particle physics and manufacturing, statistical significance is often expressed as the ratio of the standard deviation (denoted by the sigma coefficient - σ) relative to a normal probability distribution (Gaussian distribution). σ is a statistical indicator that determines the dispersion of the values ​​of a certain quantity relative to mathematical expectations. Used to plot the probability of events.

Depending on the field of knowledge, the coefficient σ varies greatly. For example, when predicting the existence of the Higgs boson, the parameter σ is equal to five (σ = 5), which corresponds to p-value = 1/3.5 million. In genome studies, the significance level can be 5 × 10 -8, which is not uncommon for this areas.

Efficiency

It is necessary to take into account that coefficients α and p-value are not exact specifications. Whatever the level of significance in the statistics of the phenomenon under study, it is not an unconditional basis for accepting the hypothesis. For example, the smaller the value of α, the greater the chance that the hypothesis being established is significant. However, there is a risk of error, which reduces the statistical power (significance) of the study.

Researchers who focus solely on statistically significant results may reach erroneous conclusions. At the same time, it is difficult to double-check their work, since they apply assumptions (which in fact are the α and p-values). Therefore, it is always recommended, along with calculating statistical significance, to determine another indicator - the magnitude of the statistical effect. Effect size is a quantitative measure of the strength of an effect.

STATISTICAL RELIABILITY

- English credibility/validity, statistical; German Validitat, statistische. Consistency, objectivity and lack of ambiguity in a statistical test or in a q.l. set of measurements. D. s. can be tested by repeating the same test (or questionnaire) on the same subject to see if the same results are obtained; or comparison various parts tests that are supposed to measure the same object.

Antinazi. Encyclopedia of Sociology, 2009

See what “STATISTICAL RELIABILITY” is in other dictionaries:

    STATISTICAL RELIABILITY- English credibility/validity, statistical; German Validitat, statistische. Consistency, objectivity and lack of ambiguity in a statistical test or in a q.l. set of measurements. D. s. can be verified by repeating the same test (or... Dictionary in Sociology

    In statistics, a value is called statistically significant if the probability of its occurrence by chance or even more extreme values ​​is low. Here, by extreme we mean the degree of deviation of the test statistics from the null hypothesis. The difference is called... ...Wikipedia

    The physical phenomenon of statistical stability is that as the sample size increases, the frequency random event or average physical quantity tends to some fixed number. The phenomenon of statistical... ... Wikipedia

    RELIABILITY OF DIFFERENCES (Similarities)- analytical statistical procedure for establishing the level of significance of differences or similarities between samples according to the studied indicators (variables) ... Modern educational process: basic concepts and terms

    REPORTING, STATISTICAL Great Accounting Dictionary

    REPORTING, STATISTICAL- a form of state statistical observation, in which the relevant authorities receive from enterprises (organizations and institutions) the information they need in the form of legally established reporting documents (statistical reports) for ... Large economic dictionary

    The science that studies techniques for systematic observation of mass phenomena social life humans, compiling their numerical descriptions and scientific processing of these descriptions. Thus, theoretical statistics is a science... ... encyclopedic Dictionary F. Brockhaus and I.A. Efron

    Correlation coefficient- (Correlation coefficient) The correlation coefficient is a statistical indicator of the dependence of two random variables. Definition of the correlation coefficient, types of correlation coefficients, properties of the correlation coefficient, calculation and application... ... Investor Encyclopedia

    Statistics- (Statistics) Statistics is a general theoretical science that studies quantitative changes in phenomena and processes. State statistics, statistical services, Rosstat (Goskomstat), statistical data, query statistics, sales statistics,... ... Investor Encyclopedia

    Correlation- (Correlation) Correlation is a statistical relationship between two or more random variables. The concept of correlation, types of correlation, correlation coefficient, correlation analysis, price correlation, correlation of currency pairs on Forex Contents... ... Investor Encyclopedia

Books

  • Research in mathematics and mathematics in research: Methodological collection on student research activities, Borzenko V.I.. The collection presents methodological developments, applicable in the organization research activities students. The first part of the collection is devoted to the application of a research approach in...

Statistical significance or p-level of significance is the main result of the test

statistical hypothesis. In technical terms, this is the probability of receiving a given

the result of a sample study, provided that in fact for the general

In the aggregate, the null statistical hypothesis is true - that is, there is no connection. In other words, this

the probability that the detected relationship is random and not a property

totality. It is statistical significance, the p-level of significance, that is

quantitative assessment of communication reliability: the lower this probability, the more reliable the connection.

Suppose, when comparing two sample means, a level value was obtained

statistical significance p=0.05. This means that testing the statistical hypothesis about

equality of means in the population showed that if it is true, then the probability

The random occurrence of detected differences is no more than 5%. In other words, if

two samples were repeatedly drawn from the same population, then in 1 of

20 cases, the same or greater difference would be found between the means of these samples.

That is, there is a 5% chance that the differences found are due to chance.

character, and are not a property of the aggregate.

In relation to a scientific hypothesis, the level of statistical significance is a quantitative

an indicator of the degree of distrust in the conclusion about the existence of a connection, calculated from the results

selective, empirical testing of this hypothesis. The lower the p-level value, the higher

the statistical significance of a research result confirming a scientific hypothesis.

It is useful to know what influences the significance level. Level of significance, all other things being equal

conditions are higher (the p-level value is lower) if:

The magnitude of the connection (difference) is greater;

The variability of the trait(s) is less;

The sample size(s) is larger.

Unilateral Two-sided significance tests

If the purpose of the study is to identify differences in the parameters of two general

aggregates that correspond to its various natural conditions (living conditions,

age of the subjects, etc.), then it is often unknown which of these parameters will be greater, and

Which one is smaller?

For example, if you are interested in the variability of results in a test and

experimental groups, then, as a rule, there is no confidence in the sign of the difference in variances or

standard deviations results from which variability is assessed. In this case

the null hypothesis is that the variances are equal, and the purpose of the study is

prove the opposite, i.e. presence of differences between variances. It is allowed that

the difference can be of any sign. Such hypotheses are called two-sided.

But sometimes the challenge is to prove an increase or decrease in a parameter;

for example, the average result in the experimental group is higher than the control group. Wherein

It is no longer allowed that the difference may be of a different sign. Such hypotheses are called

One-sided.

Significance tests used to test two-sided hypotheses are called

Double-sided, and for one-sided - unilateral.

The question arises as to which criterion should be chosen in a given case. Answer

This question is beyond the scope of formal statistical methods and is completely

Depends on the goals of the study. Under no circumstances should you choose one or another criterion after

Conducting an experiment based on the analysis of experimental data, as this may

Lead to incorrect conclusions. If, before conducting an experiment, it is assumed that the difference

The compared parameters can be either positive or negative, then you should