Application of the PCA Method to Identify Factors Affecting Poverty Rates in West Kalimantan

Poverty is one of the most challenging problems faced by developing countries like Indonesia. Poverty is influenced by various factors such as the economy, education, population, and several other factors. In this study, factors that affect the poverty rate in West Kalimantan are identified using the Principal Component Analysis method. To identify existing factors and determine the causes of poverty in West Kalimantan, it is necessary to compact the data without changing the significance of the data. The factors used are education and economic factors consisting of thirteen variables that are thought to affect poverty. Based on the Principal Component Analysis method of thirteen variables, eight reduced variables were obtained that can be used to identify poverty. From the eight variables used, three main components were formed that affect the poverty rate in West Kalimantan with a total variance value of 85.417% and a correlation value of more than 0.5 for each component. This shows that the poverty rate in West Kalimantan can be identified with the main components formed to explain the factors used.


INTRODUCTION
The poverty rate in developing countries is a complicated problem to solve even though several developing countries have succeeded in implementing development in terms of production and national income.The poverty condition of a country or region is a reflection of the level of welfare of the population living in that country or region.If the welfare of the population is lacking, it will have an impact on various things in terms of the economy, education, and health which are factors of poverty.According to Anisa (2022) in his research, he stated that poverty is still an unresolved problem in all provinces in Indonesia, including West Kalimantan Province.Poverty in West Kalimantan Province ranks second with the highest percentage of poor people in Kalimantan after North Kalimantan Province.
According to Pasaribu et al (2021) in their research, the Central Bureau of Statistics (BPS) in Indonesia uses criteria to assess poverty.BPS classifies poor people with monthly per capita income or expenditure below the poverty line.This research will use data on the percentage of poor people in West Kalimantan in 2022.The data obtained from the BPS Kalimantan Barat (2023) website can be seen in Figure 1.According to estimates by the Central Bureau of Statistics, the poverty rate in West Kalimantan in 2022 was 6.63% with a total population experiencing poverty of 350,250 people.In this case, it can be seen in Figure 1 that there are districts with the highest percentage value of poverty, namely Malawi Regency at 11.44%, and districts with the lowest poverty rate, namely the Kubu Raya Regency with a percentage of 4.12%.This study examines the variables that are suspected of causing poverty in West Kalimantan based on the variation in poverty rates in each district and city.
From the data in Figure 1, the factors influencing poverty in West Kalimantan were analyzed using the Principal Component Analysis (PCA) method.This method is used to identify the factors that influence poverty in West Kalimantan.Through this method, the variables that are thought to affect poverty will be reduced to a minimum without losing the information contained in the original data.
Based on the research of van Delsen et al (2017) to identify the factors thought to influence price increases in Ambon City using principal component analysis or PCA, this forms one main component of the ten variables used.The main components are obtained from 7 reduced result variables.The variables that make up the main components are variables X 2 (prepared food, drinks, cigarettes, tobacco), X 3 (housing, water, gas, electricity, fuel), X 4 (clothing), X 5 (health), X 6 (education, recreation, sports), X 8 (exchange rates) and X 10 (imports), namely economic needs factors that affect inflation in Ambon City with a total variance of 77.788%.

Data and Data Sources
In this study, the data used is secondary data sourced from the website of the West Kalimantan Province Central Statistics Agency.There are two categories of factors for the data to be used.The first factor is education and the second factor is the economy.These factors are data contained in the West Kalimantan BPS website.These two factors will be broken down into several variables that are thought to affect poverty, such as SD/MI/Package A Net Participation In the education factor, there is a Net Participation Rate (APM) variable, which is the number of people who attend school at the school level according to their age and is expressed as a percentage.In addition to the APM, there is also a Gross Participation Rate (APK) variable, which is the number of people who attend school at a particular school level but does not depend on their age and is expressed as a percentage.Meanwhile, the School Participation Rate variable is a comparison between APM and APS which is also expressed in percentage.The higher the percentage of APS in an area, the more people are attending school in that area.In research by Hikma et al (2019), educational participation variables such as APM, APK, and APS affect the poverty rate in Central Java with an effect of 78.4%.Therefore, the APM, APK, and APS variables are used in this study to identify educational factors that affect poverty rates in West Kalimantan.In addition to the participation rate in education, there are also variables of Expected Years of Schooling and Average Years of Schooling which are data from the human development index that are thought to affect the poverty rate.According to To'oki et al ( 2023) their research stated that simultaneously the variables of Average Years of Schooling and Expected Years of Schooling had a significant effect on poverty in Central Sulawesi Province in 2015-2019, so these variables were also used in this study.
In the economic factor, there is a variable of Labor Force Participation Rate (TPAK) which is the percentage of the labor force to the number population in a region.Meanwhile, the Open Unemployment Rate (TPT) variable is the percentage of the number of unemployment people to the number of the labor force.In the research of Tio et al (2021), the Labor Force Participation Rate and the Open Unemployment Rate together have a significant influence on the poverty rate in Riau Province.Based on this research, the TPAK and TPT variables are used to identify the poverty rate in terms of economic factors.
The variables used are variables that are thought to affect poverty.For variables from the education factor using APM, APK, and APS data because these variables are thought to be related to each other in determining the factors that affect poverty using the PCA method.The data used in this study is 2022 data because of the relationship between variables and the PCA method that will be used.

Data Analysis Method
The method used in this research is the Principal Component Analysis method.The method, which is often abbreviated as PCA, according to Halida (2020) in her research explains that Principal Component Analysis is a statistical technique for changing most of the original variables used.The original variables that are correlated with each other are reduced to a new set of variables that are smaller and mutually independent.Principal Component Analysis (PCA) helps reduce data, making it easier to interpret the data.The use of the PCA method will reduce the original variables into n new variables that produce the same information using the original variables.The reduced variables are called principal components or can also be called factors (Firdaus & Sonhaji, 2022).The use of the PCA method in this research is supported by IBM SPSS Statistics 25 software in its completion.This method is used to reduce the variables that have been selected as factors that are thought to influence the poverty rate in West Kalimantan.
The following is the test used to analyze the variables using the Principal Component Analysis method.

a. KMO and Bartlett's Test
Kaiser-Meyer-Olkin (KMO) is used to find the distance comparison index between the correlation coefficient and its partial correlation coefficient.If KMO is between 0.5 then factor analysis can be used.However, if the KMO value is less than 0.5 then factor analysis is not feasible.The formula for calculating the Kaiser-Meyer-Olkin (KMO) statistic that evaluates sample size is as follows.
where:   : simple correlation coefficient between variable  and    : partial correlation coefficient between variable  and  If the partial correlation coefficient is smaller than the correlation coefficient, the KMO value will be close to one.If the KMO value is smaller, the correlation between variables cannot be explained by the PCA method, so factor analysis should not be used.
The Bartlet Test is conducted to determine whether variables are correlated.If most of the variables' correlation coefficients are less than then this method can be used.Barlett Test hypothesis:  0 : the correlation matrix is the identity matrix  1 : correlation matrix, not identity matrix Test statistics: To determine whether the variables in the sample are correlated, can be determined by using the Barlett Test.(van Delsen et al., 2017).Meanwhile, to determine the feasibility of the data used by using factor analysis, it can be determined by the KMO value.b.The measure of Sampling Adequacy (MSA) The results of MSA are used to evaluate the adequacy of the selected variables.If the MSA value is low, it is necessary to remove the variable so that the analysis can continue.The MSA value provisions range from to with the following criteria.1. MSA value = 1 means that the variable can be predicted without error by other variables.2. MSA value > 0.5 means that variables can still be predicted and analyzed further.3. MSA value < 0.5 means that the variable cannot be predicted and cannot be analyzed further and the variable must be excluded.c.Communalities In the commonalities output, several variables are used to explain whether the variable can explain the factor or not.Variables are considered capable of explaining factors if the Extraction value is more than 0.50.

d. Total Variance Explained
In the output of total variance explained, the variables used are variables that have an eigenvalue of more than one.Variables that have a value of more than 1 will be declared as factor ISSN 2808-2605EISSN 2808-4497 Forum Analisis Statistik Juni 2023, 3 (1): 11 -28 variances that can explain the variability of the number of variables specified.The following formula is used to calculate the factor variance.

Factor Variance =
total extraction sums of squared loadings n × 100% e. Scree Plot Scree Plot is also used to see the factors formed as well as in the total variance explained output.The component value can be used if it has an eigenvalue of more than one.f.Component Matrix The component matrix output table explains the distribution of correlation values of the initial variables with the variables formed.

g. Rotated Component Matrix
A rotated Component Matrix is a rotation that is carried out to clarify the distribution of previous variables in the output component matrix.At the output of the Rotated Component Matrix, the limiting number is determined, namely 0.55.A variable cannot be part of one of the components if its value is less than 0.55.The correlation value indicates that the variable is included in the existing factor.

h. Component Transformation Matrix
The output component transformation matrix explains that factors that are suitable for use as a summary of the analyzed variables must have a correlation value greater than 0.5.
In determining the number of main components in the Principal Component Analysis method, it is usually done by fulfilling three criteria in observing the output of SPSS.The first criterion is to observe the eigenvalue greater than one.Determination of factors based on eigenvalues greater than one indicates the magnitude of the factor to the variance of all original variables.The second criterion is to see the value of variance that can be explained by more than 80%.While the third criterion is to observe the graph on the scree plot, which has more than one value (van Delsen et al, 2017).

RESULTS AND DISCUSSION
Analysis using PCA methodology is carried out if there is a correlation or relationship between the variables.The purpose of principal component analysis (PCA) is to reduce the number of variables into several new variables that do not correlate with variables (correlation = 0) and the number of variables is less than the number of initial variables (Hendro et al., 2012).The purpose of this research is to identify the factors that influence the poverty rate in West Kalimantan.The factors used consisted of education and economic factors.The two factors are divided into thirteen variables, as in Table 1.From the data in 2022, thirteen variables are thought to affect the poverty rate in West Kalimantan so this research will be tested using the Principal Component Analysis method.The data interpretation of the Principal Component Analysis method using SPSS software is as follows.Sig. .000 Table 2 shows the Bartlett's Test of Sphericity result of 155.369, with a significance level of 0.000.Meanwhile, the Kaiser-Meyer-Olkin Measure of Sampling Adequacy value is 0.561.Because the KMO value is more than 0.5, the analysis can continue.Then the variables will be analyzed to determine which variables can be processed further and which variables must be excluded using the measurements in the MSA output in Table 3.  .414.342-.493 .611.371.269 a -.060 -.581 .234-.917 .652-.301 .540 7 -.019 .239-.429 .201.047-.060 .248 a .245.395-.113 .482-.104 -.318    8  .616-.560 .599-.480 .269-.581 .245.418 a -.566 .488.044.452-.723   9 -.690 .856-.929 .540-.614 .234.395-.566 .387 a -.391 .023-.503 .485 10  .572-.524 .621-.761 -.231 -.917 -.113 .488-.391 .448 a -.668 .359-.610   11  .037.035-.293 .369.656.652.482.044.023-.668 .479 a -.123 -.129    12  .545-.641 .519-.521 .406-.301 -.104 .452-.503 .359-.123 .472 a -.218   13 -.817 .652-.551 .669-.316 .540-.318 -.723 .485-.610 -.129 -.218The MSA value in Table 3 is shown in the Anti Image Correlation row with the "a" sign.Because the variable that has a value of less than 0.5 is more than one, the variable that has the smallest MSA value, namely variable  1 , will be removed and retested without that variable.After retesting, some variables have an MSA value of less than 0.5 so the test is retested four times by removing the variables one by one.The variables excluded in order are  7 with an MSA value of 0.130, then  6 with an MSA value of 0.246, then  4 with an MSA value of 0.288, then  2 with an MSA value of 0.428.After retesting five times, the remaining variables have met the MSA requirements shown in Table 4, so that the analysis can continue.Based on Table 4, shows that the eight variables tested, namely  3 ,  5 ,  8 ,  9 ,  10 ,  11 ,  12 , and  13 , have met the MSA requirements.This variable is a reduced variable that will become the main component.The retest conducted to fulfill the MSA requirements also changed the KMO and Barlett Test values described in Table 5.After retesting, the Kaiser-Meyer-Olkin Measure of Sampling Adequacy value is 0.743, and the Barlett Test value is 64.696 with a significance value  0.000.Factor analysis in this study can be continued because the KMO value and Barlett Test significance have met the requirements.Based on Table 6, each variable's initial value in the communalities output is 1.Meanwhile, the extraction value shows how much the formed factor can explain the variance of a variable.The lowest commonalities value is for variable  8 , which is 0.681, indicating that the School Participation Rate SMP/MTs/Paket B variable can only explain 68.1% of the variance of the factors formed.Meanwhile, the highest communalities value is for variable  3 , which is 0.946, indicating that the Net Participation Rate variable of SMA/SMK/MA/Paket C can explain 94.6% of the variance of the factors formed.Likewise, the variables  5 ,  9 ,  10 ,  11 ,  12 , and  13 can be explained by the factors formed.The greater the commonalities value, the greater the relationship between variables and the factors formed.The first criterion in determining the number of principal components is to observe eigenvalues greater than one.Determination of factors based on eigenvalues greater than one indicates the magnitude of the factor to the variance of all original variables.To determine the main component, only look at variables with more than one value.If the variant value is less than one then the value cannot be used as the main component.From Table 7, three components have an eigenvalue of more than one, which means that the first criterion in determining the number of main components in the Principal Component Analysis method has been met.Of the eight variables, each variable has a variance of one so the total variance is 8 × 1 = 8.Then the variables are summarized into three factors, each of which is detailed in Table 8.The total variance of the two factors in Table 8 is 85.417%.The total variance will explain 85.417% of the eight existing variables.Because the total variance has a value of more than 80%, the second way of determining the number of primary components has also been fulfilled.Based on Table 8, three factors can be formed as the number of main components because the eigenvalue is more than one and the total variance is more than 80%.Table 8 shows three factors generated as the number of main components.This is also evident in the Scree Plot to fulfill the third requirement.In Figure 3, it can be seen that the number of points is more than one to indicate the number of factors that will be formed..853-.277 -.302 The distribution of the eight variables across the three factors is described in the Component Matrix table.The table displays a numerical representation of the correlation value between variables and the first, second, and third factors.To ensure that the variables are included in the first, second, and third components, the factors must be rotated because some variables still have insignificant values.The rotation of the component matrix resulted in Table 10, which shows that the distribution of variables is clearer and more accurate than Table 9.The limiting number is 0.55.Variables that have a factor loading of less than 0.55 cannot be included in one of the components.The largest loading value indicates that the variable is included in the first, second, or third component.The variables grouped into three main components are listed in Table 11.The Component Transformation Matrix output shows that in the first component, the correlation value is 0.633, in the second component, the correlation value is 0.708, and in the third component, the correlation value is 0.834.It can be concluded that the components created can summarize the thirteen variables studied because the correlation value of the three main components is more than 0.5.

Conclusion
The conclusions obtained from the Principal Component Analysis method that has been carried out on thirteen selected variables that are thought to affect the poverty rate in districts/cities in West Kalimantan are reduced to eight variables that form three main components.

Figure 1 .
Figure 1.Poverty Percentage in 2022 in West Kalimantan Figure 2. Data Affecting the poverty rate in West Kalimantan in 2022

Figure 3 .
Figure 3. Scree Plot Output Based on Figure 3, three component points have an eigenvalue of more than one, which can indicate From the three criteria in determining the number of main components, there are three components formed to identify the poverty rate in West Kalimantan.
The first component consists of two variables: the Gross Participation Rate of SMP/MTs/Paket B and the School Participation Rate of SMP/MTs/Paket B. The second component consists of three variables: the Net ISSN 2808-2605 EISSN 2808-4497 Forum Analisis Statistik Juni 2023, 3 (1): 11 -28 Rate, SMP/MTs/Package B Net Participation Rate, SMA/SMK/MA/Package C Net Participation Rate, Gross Participation Rate SD/MI/Package A, MP/MTs/Package B gross participation rates, SMA/SMK/MA/Package C gross participation rates, SD/MI/Package A school participation rates, SMP/MTs/Package B school SMA/SMK/MA/Package C School Participation, Years of School Expectation, Labor Force Participation Rate, Open Unemployment Rate, Average Years of Schooling.

Table 1 .
Variables in the Study

Table 2 .
KMO and Bartlett's Test Output

Table 3 .
Measure of Sampling Adequacy (MSA) Output

Table 4 .
MSA Output After Retesting

Table 5 .
KMO and Bartlett's Test Output After Retesting

Table 7 .
Total Variance Explained Output

Table 2 .
Component Matrix Output

Table 3 .
Rotated Component Matrix Output

Table 11 .
Variable Grouping The following are the main factors influencing the poverty rate in districts and cities in West Kalimantan. 1. Principal component 1, whose members are the variables : a.  5 : Gross Participation Rate of SMP/MTs/Paket B b.  8 : School Participation Rate of SMP/MTs/Paket B 2. Principal component 2, whose members are the variables : a.  3 : Net Participation Rate of SMA/SMK/MA/Paket C b.  9 : School Participation Rate of SMA/SMK/MA/Paket C c.  12 : Labor Force Participation Rate 3. Principal component 3, whose members are the variables : a.  10 : Expected Years of Schooling b.  11 : Mean Years School c.  13 : Open Unemployment Rate