Scattering characteristics. Characteristics of scattering Dispersion and its properties Chebyshev’s inequality Characteristics of position and scattering

No matter how important the average characteristics are, an equally important characteristic of an array of numerical data is the behavior of the remaining members of the array in relation to the average, how much they differ from the average, how many members of the array differ significantly from the average. During shooting training they talk about the accuracy of the results; in statistics they study the characteristics of dispersion (spread).

The difference between any value of x and the average value of x is called deviation and calculated as the difference x, - x. In this case, the deviation can take both positive values ​​if the number is greater than the average, and negative values ​​if the number is less than the average. However, in statistics it is often important to be able to operate with one number that characterizes the “accuracy” of all numerical elements of a data array. Any summation of all deviations of the array members will lead to zero, since positive and negative deviations will cancel each other out. To avoid zeroing, the squared differences, or more precisely, the arithmetic mean of the squared deviations, are used to characterize the scattering. This scattering characteristic is called sample variance.

The greater the variance, the greater the scattering of the random variable values. To calculate the dispersion, an approximate value of the sample mean x is used with a margin of one digit in relation to all members of the data array. Otherwise, when summing a large number of approximate values, a significant error will accumulate. In connection with the dimensionality of numerical values, one drawback of such a dispersion indicator as sample dispersion should be noted: the unit of measurement of dispersion D is the square of the unit of measurement of values X, whose characteristic is dispersion. To get rid of this drawback, statistics introduced such a scattering characteristic as sample standard deviation , which is denoted by the symbol A (read “sigma”) and is calculated using the formula

Normally, more than half of the members of the data array differ from the average by less than the standard deviation, i.e. belong to the segment [X - A; x + a]. Otherwise they say: the average, taking into account the spread of data, is equal to x ± a.

The introduction of another scattering characteristic is associated with the dimension of the data array members. All numerical characteristics in statistics are introduced for the purpose of comparing the results of studying different numerical arrays characterizing different random variables. However, comparing standard deviations from different average values ​​of different data sets is not indicative, especially if the dimensions of these quantities are also different. For example, if the length and weight of any objects or scattering in the manufacture of micro- and macro-products are compared. In connection with the above considerations, a relative scattering characteristic is introduced, which is called coefficient of variation and is calculated by the formula

To calculate the numerical characteristics of the scattering of random variable values, it is convenient to use a table (Table 6.9).

Table 6.9

Calculation of numerical characteristics of the scattering of random variable values

Xj- X

(Xj-X)2/

The sample mean is in the process of filling out this table. X, which will be used in two forms in the future. As a final average characteristic (for example, in the third column of the table) sample average X must be rounded to the digit corresponding to the smallest digit of any member of the numeric data array x g However, this indicator is used in the table for further calculations, and in this situation, namely when calculating in the fourth column of the table, the sample average X must be rounded with a margin of one digit relative to the smallest digit of any member of the numeric data array X ( .

The result of calculations using a table like table. 6.9 will obtain the value of the sample dispersion, and to record the answer it is necessary, based on the value of the sample dispersion, to calculate the value of the standard deviation a.

The answer indicates: a) the average result taking into account the spread of data in the form x±o; b) data stability characteristic V. The answer should evaluate the quality of the coefficient of variation: good or bad.

The acceptable coefficient of variation as an indicator of homogeneity or stability of results in sports research is considered to be 10-15%. The coefficient of variation V= 20% in any research is considered a very large figure. If the sample size P> 25, then V> 32% is a very bad indicator.

For example, for a discrete variation series 1; 5; 4; 4; 5; 3; 3; 1; 1; 1; 1; 1; 1; 3; 3; 5; 3; 5; 4; 4; 3; 3; 3; 3; 3 tables 6.9 will be filled out as follows (Table 6.10).

Table 6.10

An example of calculating the numerical characteristics of the scattering of values

*1

fi

1

L P 25 = 2,92 = 2,9

D_S_47.6_ P 25

Answer: a) the average characteristic, taking into account the spread of data, is equal to X± a = = 3 ± 1.4; b) the stability of the obtained measurements is at a low level, since the coefficient of variation V = 48% > 32%.

Analogue of the table 6.9 can also be used to calculate the scattering characteristics of an interval variation series. At the same time, the options x g will be replaced by representatives of the gaps x v ja absolute frequencies option f(- to absolute frequencies of intervals fv

Based on the above, the following can be done: conclusions.

The conclusions of mathematical statistics are plausible if information about mass phenomena is processed.

Typically, a sample is studied from the general population of objects, which must be representative.

Experimental data obtained as a result of studying any property of sample objects represents the value of a random variable, since the researcher cannot predict in advance which number will correspond to a particular object.

To select one or another algorithm for describing and initially processing experimental data, it is important to be able to determine the type of random variable: discrete, continuous or mixed.

Discrete random variables are described by a discrete variation series and its graphical form - a frequency polygon.

Mixed and continuous random variables are described by an interval variation series and its graphical form - a histogram.

When comparing several samples according to the generated level of a certain property, the average numerical characteristics and the numerical characteristics of the scattering of a random variable in relation to the average are used.

When calculating the average characteristic, it is important to correctly select the type of average characteristic that is adequate to its area of ​​application. Structural average values, mode and median, characterize the structure of the location of the variant in an ordered array of experimental data. The quantitative average makes it possible to judge the average size of the option (sample average).

To calculate the numerical characteristics of scattering - sample variance, standard deviation and coefficient of variation - the tabular method is effective.

Position characteristics describe the center of the distribution. At the same time, the meanings of the option can be grouped around it in both a wide and narrow band. Therefore, to describe the distribution, it is necessary to characterize the range of changes in the values ​​of the characteristic. Scattering characteristics are used to describe the range of variation of a characteristic. The most widely used are range of variation, dispersion, standard deviation and coefficient of variation.

Range of variation is defined as the difference between the maximum and minimum value of a characteristic in the population being studied:

R=x max - x min.

The obvious advantage of the indicator under consideration is the simplicity of calculation. However, since the scope of variation depends on the values ​​of only the extreme values ​​of the characteristic, the scope of its application is limited to fairly homogeneous distributions. In other cases, the information content of this indicator is very small, since there are many distributions that are very different in shape, but have the same range. In practical studies, the range of variation is sometimes used with small (no more than 10) sample sizes. For example, from the range of variation it is easy to assess how different the best and worst results are in a group of athletes.

In this example:

R=16.36 – 13.04=3.32 (m).

The second characteristic of scattering is dispersion. Dispersion is the average square of the deviation of a random variable from its mean. Dispersion is a characteristic of scattering, the spread of values ​​of a quantity around its average value. The word “dispersion” itself means “scattering.”

When conducting sample studies, it is necessary to establish an estimate for the variance. The variance calculated from sample data is called sample variance and is denoted S 2 .

At first glance, the most natural estimate for variance is statistical variance, calculated based on the definition using the formula:

In this formula - the sum of squared deviations of attribute values x i from the arithmetic mean . To obtain the mean square deviation, this sum is divided by the sample size P.

However, such an estimate is not unbiased. It can be shown that the sum of squared deviations of attribute values ​​for a sample arithmetic mean is less than the sum of squared deviations from any other value, including from the true mean (mathematical expectation). Therefore, the result obtained from the above formula will contain a systematic error, and the estimated value of the variance will be underestimated. To eliminate the bias, it is enough to introduce a correction factor. The result is the following relationship for the estimated variance:

For large values n Naturally, both estimates - biased and unbiased - will differ very little and the introduction of a correction factor becomes meaningless. As a rule, the formula for estimating variance should be refined when n<30.

In the case of grouped data, the last formula can be reduced to the following form to simplify calculations:

Where k- number of grouping intervals;

n i- interval frequency with number i;

x i- the median value of the interval with number i.

As an example, let’s calculate the variance for the grouped data of the example we are analyzing (see Table 4.):

S 2 =/ 28=0.5473 (m2).

The variance of a random variable has the dimension of the square of the dimension of the random variable, which makes it difficult to interpret and makes it not very clear. For a more visual description of scattering, it is more convenient to use a characteristic whose dimension coincides with the dimension of the characteristic being studied. For this purpose, the concept is introduced standard deviation(or standard deviation).

Standard Deviation is called the positive square root of the variance:

In our example, the standard deviation is equal to

The standard deviation has the same units of measurement as the results of measuring the characteristic under study and, thus, it characterizes the degree of deviation of the characteristic from the arithmetic mean. In other words, it shows how the main part of the option is located relative to the arithmetic mean.

Standard deviation and variance are the most widely used measures of variation. This is due to the fact that they are included in a significant part of the theorems of probability theory, which serves as the foundation of mathematical statistics. In addition, the variance can be decomposed into its component elements, which make it possible to assess the influence of various factors on the variation of the trait under study.

In addition to the absolute indicators of variation, which are dispersion and standard deviation, relative ones are introduced in statistics. The coefficient of variation is most often used. The coefficient of variation equal to the ratio of the standard deviation to the arithmetic mean, expressed as a percentage:

From the definition it is clear that, in its meaning, the coefficient of variation is a relative measure of the dispersion of a characteristic.

For the example in question:

The coefficient of variation is widely used in statistical research. Being a relative value, it allows you to compare the variability of both characteristics that have different units of measurement, as well as the same characteristic in several different populations with different values ​​of the arithmetic mean.

The coefficient of variation is used to characterize the homogeneity of the obtained experimental data. In the practice of physical culture and sports, the spread of measurement results depending on the value of the coefficient of variation is considered to be small (V<10%), средним (11-20%) и большим (V> 20%).

Restrictions on the use of the coefficient of variation are associated with its relative nature - the definition contains normalization to the arithmetic mean. In this regard, at small absolute values ​​of the arithmetic mean, the coefficient of variation may lose its information content. The closer the arithmetic mean is to zero, the less informative this indicator becomes. In the limiting case, the arithmetic mean goes to zero (for example, temperature) and the coefficient of variation goes to infinity, regardless of the spread of the characteristic. By analogy with the case of error, the following rule can be formulated. If the value of the arithmetic mean in the sample is greater than one, then the use of the coefficient of variation is legal; otherwise, dispersion and standard deviation should be used to describe the spread of experimental data.

In conclusion of this part, we will consider the assessment of variations in the values ​​of evaluation characteristics. As already noted, the values ​​of distribution characteristics calculated from experimental data do not coincide with their true values ​​for the general population. It is not possible to accurately establish the latter, since, as a rule, it is impossible to survey the entire population. If we use the results of different samples from the same population to estimate distribution parameters, it turns out that these estimates for different samples differ from each other. Estimated values ​​fluctuate around their true values.

Deviations of estimates of general parameters from the true values ​​of these parameters are called statistical errors. The reason for their occurrence is the limited sample size - not all objects in the general population are included in it. To estimate the magnitude of statistical errors, the standard deviation of sample characteristics is used.

As an example, consider the most important characteristic of position - the arithmetic mean. It can be shown that the standard deviation of the arithmetic mean is determined by the relation:

Where σ - standard deviation for the population.

Since the true value of the standard deviation is not known, a quantity called standard error of the arithmetic mean and equal:

The value characterizes the error that, on average, is allowed when replacing the general average with its sample estimate. According to the formula, increasing the sample size during a study leads to a decrease in the standard error in proportion to the square root of the sample size.

For the example under consideration, the standard error of the arithmetic mean is equal to . In our case, it turned out to be 5.4 times less than the standard deviation.

    EFFECTIVE SCATTERING SURFACE (AREA)- characteristic of the reflectivity of the target, expressed by the ratio of electrical power. mag. energy reflected by the target in the direction of the receiver to the surface energy flux density incident on the target. Depends on… … Encyclopedia of the Strategic Missile Forces

    Quantum mechanics ... Wikipedia

    - (EPR) characteristic of the reflectivity of a target irradiated by electromagnetic waves. The EPR value is defined as the ratio of the flow (power) of electromagnetic energy reflected by the target in the direction of the radio-electronic equipment (RES) to... ... Marine Dictionary

    scatter band- Statistical characteristics of experimental data, reflecting their deviation from the average value. Topics: metallurgy in general EN desperal band ... Technical Translator's Guide

    - (modulation transfer function), function, with the help of the cut the “sharpness” properties of the imaging optical lenses are assessed. systems and dept. elements of such systems. Ch.k.x. is the so-called Fourier transform. line scattering function describing the nature of the “spreading”... ... Physical encyclopedia

    Modulation transfer function, a function that evaluates the “sharpness” properties of imaging optical systems and individual elements of such systems (see, for example, Sharpness of a photographic image). Ch.k.x. there is Fourier... ...

    scatter band- statistical characteristic of experimental data, reflecting their deviation from the average value. See also: Slip strip Relief strip Hardenability strip... Encyclopedic Dictionary of Metallurgy

    SCATTERING BAND- statistical characteristic of experimental data, reflecting their deviation from the average value... Metallurgical dictionary

    Characteristics of the scattering of random variable values. M. t. h is related to the square deviation (See Square deviation) σ by the formula This method of measuring scattering is explained by the fact that in the case of normal ... ... Great Soviet Encyclopedia

    VARIATION STATISTICS- VARIATION STATISTICS, a term that unites a group of statistical analysis techniques used primarily in the natural sciences. In the second half of the 19th century. Quetelet, “Anthro pometrie ou mesure des differentes facultes de 1... ... Great Medical Encyclopedia

    Expected value- (Population mean) Mathematical expectation is the probability distribution of a random variable. Mathematical expectation, definition, mathematical expectation of discrete and continuous random variables, sample, conditional expectation, calculation,... ... Investor Encyclopedia

One of the reasons for conducting statistical analysis is the need to take into account the influence of random factors (disturbances) on the indicator under study, which lead to scattering (scattering) of the data. Solving problems in which there is scattered data is associated with risk, since even if you use all the available information, you cannot exactly predict what will happen in the future. To adequately deal with such situations, it is advisable to understand the nature of the risk and be able to determine the degree of dispersion of a data set. There are three numerical characteristics that describe the measure of dispersion: standard deviation, range and coefficient of variation (variability). Unlike typical indicators (mean, median, mode) characterizing the center, scattering characteristics show how close The individual values ​​of the data set are located towards this center
Definition of standard deviation Standard deviation(standard deviation) is a measure of the random deviations of data values ​​from the mean. In real life, most data is characterized by scattering, i.e. individual values ​​are located at some distance from the average.
It is impossible to use standard deviation as a general characteristic of scattering by simply averaging data deviations, because part of the deviations will be positive, and the other part will be negative, and, as a result, the result of averaging may be equal to zero. To get rid of the negative sign, use the standard technique: first calculate dispersion as the sum of squared deviations divided by ( n–1), and then the square root is taken from the resulting value. The formula for calculating the standard deviation is as follows: Note 1: Variance does not convey any additional information compared to the standard deviation, but it is more difficult to interpret because it is expressed in “units squared”, while the standard deviation is expressed in units familiar to us (for example, dollars). Note 2: The above formula is for calculating the standard deviation of a sample and is more accurately called sample standard deviation. When calculating standard deviation population(denoted by the symbol s) divide by n. The value of the sample standard deviation is slightly larger (since it is divided by n–1), which provides a correction for the randomness of the sample itself. When the data set is normally distributed, the standard deviation takes on a special meaning. In the figure below, marks are made on either side of the mean at distances of one, two and three standard deviations, respectively. The figure shows that approximately 66.7% (two thirds) of all values ​​fall within one standard deviation on either side of the mean, 95% of the values ​​fall within two standard deviations of the mean, and almost all of the data (99.7%) will be within three standard deviations from the mean.
66,7%


This property of the standard deviation for normally distributed data is called the “two-thirds rule.”

In some situations, such as product quality control analysis, limits are often set such that those observations (0.3%) that are more than three standard deviations from the mean are considered a worthy problem.

Unfortunately, if the data does not follow a normal distribution, then the rule described above cannot be applied.

There is currently a constraint called Chebyshev's rule that can be applied to asymmetric (skewed) distributions.

Generate initial data Set of SV

Table 1 shows the dynamics of changes in daily profits on the stock exchange, recorded on working days for the period from July 31 to October 9, 1987.

Table 1. Dynamics of changes in daily profit on the stock exchange

date Daily profit date Daily profit date Daily profit
-0,006 0,009 0,012
-0,004 -0,015 -0,004
0,008 -0,006 0,002
0,011 0,002 -0,008
-0,001 0,011 -0,010
0,017 0,013 -0,013
0,017 0,002 0,009
-0,004 -0,018 -0,020
0,008 -0,014 -0,003
-0,002 -0,001 -0,001
0,006 -0,001 0,017
-0,017 -0,013 0,001
0,004 0,030 -0,000
0,015 0,007 -0,035
0,001 -0,007 0,001
-0,005 0,001 -0,014
Launch Excel
Create file Click the Save button on the Standard toolbar. Open the Statistics folder in the dialog box that appears and name the file Scattering Characteristics.xls.
Set label 6. On Sheet1, in cell A1, set the label Daily Profit, 7. and in the range A2:A49, enter the data from Table 1.
Set the AVERAGE VALUE function 8. In cell D1, enter the label Average. In cell D2, calculate the average using the AVERAGE statistical function.
Set the STANDARDEV function In cell D4, enter the label Standard Deviation. In cell D5, calculate the standard deviation using the statistical function STDEV
Reduce the bit size of the result to the fourth decimal place.
Interpretation of results Decline The average daily profit was 0.04% (the average daily profit was -0.0004). This means that the average daily profit for the period under consideration was approximately zero, i.e. the market maintained an average rate. The standard deviation turned out to be 0.0118. This means that one dollar ($1) invested in the stock market changed by an average of $0.0118 per day, i.e. his investment could result in a gain or loss of $0.0118.
Let's check whether the daily profit values ​​given in Table 1 correspond to the rules of normal distribution 1. Calculate the interval corresponding to one standard deviation on either side of the mean. 2. In cells D7, D8 and F8, set the labels respectively: One standard deviation, Lower bound, Upper bound. 3. In cell D9, enter the formula = -0.0004 – 0.0118, and in cell F9, enter the formula = -0.0004 + 0.0118. 4. Get the result accurate to the fourth decimal place.

5. Determine the number of daily profit values ​​that are within one standard deviation. First, filter the data, leaving the daily profit values ​​in the range [-0.0121, 0.0114]. To do this, select any cell in column A with daily profit values ​​and run the command:

Data®Filter®AutoFilter

Open the menu by clicking the arrow in the header Daily profit, and select (Condition...). In the Custom AutoFilter dialog box, set the options as shown below. Click OK.

To count the number of filtered data, select the range of daily profit values, right-click on an empty space in the status bar and select Number of Values ​​from the context menu. Read the result. Now display all the original data by running the command: Data®Filter®Display All and turn off the autofilter using the command: Data®Filter®AutoFilter.

6. Calculate the percentage of daily profit values ​​that are one standard deviation away from the mean. To do this, put the label in cell H8 Percent, and in cell H9 program the formula for calculating the percentage and get the result accurate to one decimal place.

7. Calculate the range of daily profit values ​​within two standard deviations from the mean. In cells D11, D12 and F12, set the labels accordingly: Two standard deviations, Bottom line, Upper limit. Enter the calculation formulas in cells D13 and F13 and get the result accurate to the fourth decimal place.

8. Determine the number of daily profit values ​​that are within two standard deviations by first filtering the data.

9. Calculate the percentage of daily profit values ​​that are two standard deviations away from the mean. To do this, put the label in cell H12 Percent, and in cell H13 program the percentage calculation formula and get the result accurate to one decimal place.

10. Calculate the range of daily profit values ​​within three standard deviations from the mean. In cells D15, D16 and F16, set the labels accordingly: Three standard deviations, Bottom line, Upper limit. Enter the calculation formulas in cells D17 and F17 and get the result accurate to the fourth decimal place.

11. Determine the number of daily profit values ​​that are within three standard deviations by first filtering the data. Calculate the percentage of daily profit values. To do this, put the label in cell H16 Percent, and in cell H17 program the formula for calculating the percentage and get the result accurate to one decimal place.

13. Construct a histogram of the daily stock returns on the stock exchange and place it along with the frequency distribution table in area J1:S20. Show on the histogram the approximate mean and intervals corresponding to one, two, and three standard deviations from the mean, respectively.

Scattering characteristics

Measures of sampling dispersion.

The minimum and maximum of the sample are, respectively, the smallest and largest values ​​of the variable being studied. The difference between the maximum and minimum is called scope samples. All sample data are located between the minimum and maximum. These indicators seem to outline the boundaries of the sample.

R№1= 15.6-10=5.6

R №2 =0.85-0.6=0.25

Sample variance(English) variance) And standard deviation samples (English) standard deviation) are a measure of the variability of a variable and characterize the degree of scattering of data around the center. In this case, the standard deviation is a more convenient indicator due to the fact that it has the same dimension as the actual data being studied. Therefore, the standard deviation indicator is used along with the arithmetic mean of the sample to briefly describe the results of data analysis.

It is more expedient to calculate the sample variance using the formula:

The standard deviation is calculated using the formula:

The coefficient of variation is a relative measure of the dispersion of a trait.

The coefficient of variation is also used as an indicator of the homogeneity of sample observations. It is believed that if the coefficient of variation does not exceed 10%, then the sample can be considered homogeneous, i.e., obtained from one general population.

Since the coefficient of variation is in both samples, they are homogeneous.

The sample can be presented analytically in the form of a distribution function, as well as in the form of a frequency table consisting of two lines. In the top line are the selection elements (options), arranged in ascending order; The frequencies of the option are written in the bottom line.

Variant frequency is a number equal to the number of repetitions of a given variant in the sample.

Sample No. 1 “Mothers”

Type of distribution curve

Asymmetry or coefficient of skewness (a term first coined by Pearson, 1895) is a measure of the skewness of a distribution. If the skewness is clearly different from 0, the distribution is asymmetric, the density of the normal distribution is symmetrical about the mean.

Index asymmetry(English) skewness) is used to characterize the degree of symmetry of the data distribution around the center. Asymmetry can take both negative and positive values. A positive value for this parameter indicates that the data is shifted to the left of center, and a negative value indicates that the data is shifted to the right. Thus, the sign of the skewness index indicates the direction of the data bias, while the magnitude indicates the degree of this bias. Skewness equal to zero indicates that the data is symmetrically concentrated around the center.

Because the asymmetry is positive, therefore, the top of the curve moves to the left of the center.

Kurtosis coefficient(English) kurtosis) is a characteristic of how closely the bulk of the data is grouped around the center.

With a positive kurtosis, the curve sharpens, with a negative kurtosis, it smoothes out.

The curve is flattened;

The curve sharpens.