ORIGINAL RESEARCH
Evaluation of contemporary calcium scoring methods in peripheral artery disease
Ogawa K,1 Tan MT,2 Asif A,2 Lakshminarayan R2
Plain English Summary
Why we undertook the work: Calcium in the blood vessels is common in peripheral arterial disease. We wanted to compare some of the methods of assessing calcium in blood vessels and see if there are any differences in how each method scores the same amount of calcium.
What we did: We compared 100 patients’ arterial calcium on their CT scans and graded them individually between three interpreters trained in reporting vascular imaging. A CT scan is a set of images that provides a cross-sectional image of a patient, allowing doctors to view individual structures within a patient’s body. We then compared our results to see if there are any discrepancies.
What we found: We found that each method of assessing calcium for the same blood vessel can have a very different assessment of its severity. We also found that the methods of assessing calcium are not the same between different interpreters.
What this means: Current methods of calcium scoring may not be standardised between each other, resulting in different severity gradings between different interpreters. Current systems could be improved with more objective measurements. More research is required into creating generalisable calcium scoring systems.
Abstract
Background: Peripheral arterial calcification is an indicator of poor outcomes following endovascular revascularisation in peripheral arterial disease (PAD). Calcium scoring methods are often used in clinical trials and clinical practice to quantify calcium deposition in PAD. It is suggested that the amount of vessel calcification (VC) can have an impact on outcomes following endovascular intervention. However, there are few studies assessing the reproducibility of contemporary calcium scoring tools.
Methods: We analysed 100 pre-interventional computed tomography angiography images of randomly selected patients who had undergone peripheral angioplasty between June 2022 and December 2022, evaluating the Peripheral Artery Calcium Scoring System (PACSS), the Peripheral Academic Research Consortium (PARC) calcium scoring grade and the Fanelli grade. Three independent raters trained in VC scoring and vascular reporting graded all 100 patients using measurements obtained from multi-planar reconstructions, and the inter-rater and methodological agreement intraclass correlation coefficients (two-way random effects, absolute agreement, single rater) were calculated to determine the reliability of the VC scoring systems.
Results: There was a significant difference in the proportions of PARC (p<0.001) and PACSS grades (p<0.001) allocated by the three scorers. The inter-rater reliability of the grading systems ranged from 0.585 to 0.672, with reliability decreasing for moderate-to-severe and severe lesions. The methodological agreement of the scoring systems was 0.860, but decreased in grading of moderate-to-severe and severe lesions.
Conclusion: There is poor agreement between the three main calcium scoring systems. Calcium scoring systems for grading calcifications may suffer from poor reproducibility between vascular reporters. Those developing calcium scoring systems should reconsider variables susceptible to inter-rater variance, which might impact its use.
Introduction
Vessel calcification (VC) of the peripheral arteries poses a significant challenge to endovascular treatment of peripheral arterial disease (PAD)1 due to its association with poorer outcomes.2 The importance of VC is highlighted in it being a research priority for the assessment of PAD and chronic limb threatening ischaemia.3,4 To date, there has yet to be a conclusive validation of the current contemporary calcium scoring systems. Three of the main scoring systems used are the Peripheral Artery Calcium Scoring System (PACSS), the Peripheral Academic Research Consortium (PARC) score and the Fanelli score, which can be used in angiography or on computed tomography (CT) angiography. These scores are frequently employed in studies assessing the efficacy of various interventions for PAD, and have been shown to be associated with poorer outcomes post-intervention.2 Although the scores were not developed with interchangeability purposes, studies assessing the impact of VC use one or a variant of these scoring systems.2 This implies that VC severity assessed by one scoring system would have a similar clinical impact to that of another scoring system, despite a lack of evidence showing equivalence or interchangeability. In recent years, studies have emerged questioning the reliability and accuracy of these scoring systems,5 which will have important repercussions considering their use in clinical trials.
The aim of this study is to evaluate the reliability of contemporary calcium VC scoring systems in assessing calcium lesion severity between different reporters, and to assess if severity gradings of different VC scoring systems are interchangeable.
Methods
Retrospective VC scoring was carried out on 100 patients who had undergone peripheral endovascular intervention in a tertiary centre from June 2022 to December 2022 and had a preoperative CT peripheral angiography. The CT angiography was carried out with a Cannon Aquillion One (320 detectors) at 100 kV and 200 mA with a slice thickness of 1 mm. 80 mL of iodinated contrast was injected at a rate of 4 mL/s. The 100 patients were randomly selected via simple computer-assigned randomisation from a pool of eligible records. The inclusion criteria included patients receiving an index procedure for endovascular treatment of PAD, who had had a preoperative CT angiogram and had an identifiable infrainguinal index lesion on preoperative CT that was subsequently treated. Exclusion criteria were patients <18 years of age and those who had prior surgical or endovascular treatment of the target lesion.
Data on the sex, age and peripheral calcium scores (PARC, PACSS and Fanelli) were collected. Procedure reports were used to identify the target (treated) lesion of which its severity of VC would be graded. The scores were evaluated on IMPAX CT multi-planar reconstruction at maximal intensity projection to view the whole extent of VC of the target lesion, with the inbuilt geometric measuring tool for measurement purposes. Three independent raters were selected who were trained in vascular reporting, the use of multi-planar reconstruction technique on IMPAX and the use of the VC scoring systems. The information shared with them included the patient’s details, the CT images they were supposed to review and the location of the segment of treated calcium that they would score. The identity of each rater has been anonymised in the reporting of their results.
Definitions
As the scoring grades of the PARC score, PACSS and the Fanelli score are different, we homogenised the grading such that each scoring system’s definition will be assigned an analogous numerical grade. Comparisons between calcium scoring systems have been carried out for previous studies analysing their inter-rater reliability but, whilst Allan et al5 used a simplified binary assessment for VC (non-severe vs severe) for the purpose of analysing the impact of differing methods of assessing calcium lengths and extant, we have taken this further and extrapolated the grading to all levels of calcium severity. For instance, the PACSS and Fanelli grading scheme graded calcium on a scale of 0–4. We equated these numerical grades to a qualitative descriptor such as that described in the PARC score, which meant that a grade of 0 would be equated to no VC; a grade of 1 would be equated to focal VC; a grade of 2 would be equated to mild VC; a grade of 3 would be equated to moderate VC; and a grade of 4 would be equated to severe VC. Vice-versa, the descriptive grades were made analogous to the numerical grades for statistical analysis. A description of these grades and definitions is shown in Table 1.

Although the PACSS is normally assessed via high intensity fluoroscopy and digital subtraction angiography, PACSS scoring on CT imaging has been used in a number of studies6-8 despite a lack of evidence to show that there would be a significant difference in the appearance of calcium on CT imaging. Assessment based on CT imaging most closely resembles real-world practice of vascular care, with non-invasive investigation now becoming a norm for investigation of PAD prior to intervention.4
As the PARC score and PACSS used the measurement of lesion length, we defined lesion length as the length of VC that is continuous with the segment of the treated calcium lesion.
We used the intraclass correlation coefficient (ICC) to determine the reliability of the scoring systems. The ICC was used to determine the methodological agreement, which is how consistently the PACSS, PARC score and Fanelli scores agreed with each other within a single rater, as well as the inter-rater reliability, which is how consistently the same PACSS, PARC grade and Fanelli grades were applied to the same VC between the three different raters. We calculated the inter-rater reliability and methodological agreement for the overall grades. We also assessed the inter-rater reliability and methodological agreement of the moderate-to-severe grades and the severe grade by assessing them in a binary format as grades 0–2 versus grades 3–4, and grades 0–3 versus grade 4, respectively. This is because, methodologically, the PACSS, the PARC and the Fanelli scoring systems share similar assessments for moderate and severe VC grades, requiring either the presence of bilateral calcification or >180° of circumferential calcification (Table 1). Hence, an analysis of the moderate to severe VC grades will be able to identify if these definitions are reproducible between reporters. The PACSS and the PARC grades differ in their definition of severe VC in that the PACSS uses a measurement of >180° circumferential calcification greater than half of the total lesion length whereas the PARC grade uses an objective length of bilateral calcification >5 cm. The Fanelli grading system does not incorporate a length measurement at all (Table 1). The analysis of severe VC grades would be able to assess if the differences in these definitions impact the reproducibility of the scoring systems.
We also opted not to use the subtypes defined in PACSS, further denoting intimal, medial and mixed-type calcification as these subtypes are not used in the PARC or the Fanelli scoring system, limiting analysis of its inter-rater reliability.
Statistical analysis
Continuous values are presented as mean ± standard deviation. Ordinal values are assessed as proportions and compared with a χ2 test. Proportion comparisons are reported with χ2 and p values. Averages were compared using the Kruskal–Wallis test and reported with p values. The ICC was calculated to determine the methodological agreement and inter-rater reliability. The ICC model used is a two-way random effects, absolute agreement, single-rater model. ICC values are reported with 95% confidence intervals (CI) and with its reliability strength as defined by Koo et al,9 where <0.5 indicates poor reliability; 0.5–0.75 indicates moderate reliability; 0.75–0.9 indicates good reliability; and >0.9 indicates excellent reliability. Reliability strength was reported using the range of the 95% CI. We reported the inter-rater reliability and methodological agreement ICCs with a 95% CI range. Significance was evaluated at p<0.05. Statistical analysis was conducted with the use of PSPP v2.0.0.
Results
Demographics
After excluding ineligible patients, we identified a total of 147 patients and randomly selected 100 patients (68 men and 32 women) for analysis. Their average age was 70.8±12.5 years.
Outcomes
The mean PARC grade was 2.67±1.27, the mean PACSS grade was 2.53±1.33 and the mean Fanelli grade was 2.51±1.25. There was a significant difference in the average grades of the three VC scoring systems (p=0.017). There was a significant difference in the proportions of the PARC and PACSS grades (χ2=43.2, p<0.00 and χ2=27.5, p<0.001, respectively). There was no significance in the proportion of Fanelli grades (χ2=8.9, p=0.347). Within each rater we found that there was a significant difference in the proportion of grades between the PARC, PACSS and Fanelli scoring systems (χ2=23.1, p=0.003; χ2=28.0, p<0.001; and χ2=43.2, p<0.001 for raters 1, 2 and 3, respectively). A summary of the grades is shown in Figures 1–3.



The overall methodological agreement was good to excellent with an ICC of 0.860 (95% CI 0.773 to 0.904). However, the methodological agreement for moderate-to-severe grading was moderate-to-excellent (0.872, 95% CI 0.773 to 0.967) and the methodological agreement for severe grading was poor-to-moderate (0.502, 95% CI 0.351 to 0.622) (Table 2).

The overall inter-rater reliability of the PARC calcium grading system was poor-to-moderate, with an ICC of 0.585 (95% CI 0.479 to 0.682). The inter-rater reliability of the PARC system for moderate-to-severe grades was also poor-to-moderate with an ICC of 0.525 (95% CI 0.411 to 0.631), and the reliability of its severe grades was poor with an ICC of 0.345 (95% CI 0.218 to 0.472).
The overall inter-rater reliability of PACSS was poor-to-moderate with an ICC of 0.603 (95% CI 0.492 to 0.700). The inter-rater reliability of PACSS for moderate-to-severe grades was also poor-to-moderate with an ICC of 0.484 (95% CI 0.366 to 0.600), and the reliability of its severe grades was poor-to-moderate with an ICC of 0.374 (95% CI 0.235 to 0.622).
The overall inter-rater reliability of the Fanelli grading system was moderate-to-good with an ICC of 0.672 (95% CI 0.579 to 0.753). The inter-rater reliability of the Fanelli grading system for moderate-to-severe grades was poor-to-moderate with an ICC of 0.577 (95% CI 0.470 to 0.676), and the reliability of its severe grades was poor-to-moderate with an ICC of 0.547 (95% CI 0.436 to 0.650).
Discussion
Severe VC when assessed via PACSS is known to be a risk factor of angioplasty-associated complications,10,11 in-stent restenosis12 and lower PACSS grades are associated with better angioplasty success.13 However, the literature is more controversial in predicting prognostic success of interventional and surgical procedures, with Stavroulakis et al suggesting that a PACSS grade of 4 did not impact the loss of patency of total lesion revascularisation14 and Yanaguichi et al suggesting that higher PACSS grades do not predict wound recurrence rates post-endovascular therapy,15 whereas Dukic et al16 suggest that lower PACSS is associated with greater clinical success and Mori et al17 suggest that higher PACSS is associated with poorer clinical outcomes. A systematic review and meta-analysis of the impact of VC demonstrated that major adverse limb events, amputation, major adverse cardiovascular events and mortality were associated with more severe VC, but included a heterogenous assessment of calcium scoring methods, further highlighting the need for reproducible and generalisable VC scoring tools that can be used in preoperative imaging.2
Apart from the original study by Fanelli et al linking calcification circumference >270°,18 no literature has been published on the clinical significance of the Fanelli and PARC scores. Attempts have been made to validate these grades with intravascular ultrasound (IVUS) but the literature is divided: Yin et al suggest that the PACSS and PARC scoring systems can be validated in grading superficial calcium,19 but Allan et al point out that PACSS, PARC grade and Fanelli grades themselves fail to agree between each other, particularly in the assessment of severe VC.5 Our findings do not contrast greatly with the recent findings of Nugteren et al, where the inter-observer agreement of the PACSS score was moderate-to-substantial and the intra-observer agreement of the PACSS score was generally good. We do, however, differ in our findings when evaluating for moderate-to-severe VC, as expressed in their binary PACSS score, where they found better inter-observer agreement and similar intra-observer agreement compared with their assessment of the original PACSS.20 Our results contrast sharply with the finding that inter-rater agreement decreases with discrimination of moderate/severe grade calcium when assessed by the PACSS, but instead reinforce the finding made by Allan et al that the inter-rater agreement of the contemporary calcium scoring systems are poor in the assessment of severe and moderate-to-severe VC. This might be due to the existence of the prearranged criteria that do not exist in the original PACSS criteria, such as the addition of multiple calcified plaques where the target lesion is >5 cm or where the length of the target lesion is equal to the length of contiguous stenosis >50% in the vessel. It is also acknowledged that these scoring arrangements were created to increase the inter-observer agreement.
We propose a few reasons why the inter-rater agreement of severe VC may be poor. First, the PACSS and PARC grade use both vessel length and circumferential coverage in their grading system, whilst the Fanelli grade exclusively uses circumferential coverage. Second, between PACSS and PARC grades, the vessel length is not evaluated in the same manner: PACSS uses a nominal lesion length (>5 cm) whilst PARC uses a proportional measurement (>50% of total lesion length). Third, PACSS calculates circumferential coverage differently from PARC as well: PACSS uses bilateral calcification whereas PARC uses a defined calcium arc circumference (>180°), and the Fanelli grade stipulates a different calcium arc circumference from PARC grade (>270°) whilst the PACSS grade does not define a specific circumference but uses unilateral or bilateral calcification. Fourth, there is a significant difference even between raters as to what they view as the most clinically significant VC, especially as to whether VC distal and/or proximal to the target vessel should also be considered due to its possible implications in lesion crossing, prognosis, etc. We believe this to be the main reason why three different raters differed in the identification of the target lesion and why they differed in agreement in the identification of patients with no significant vessel calcification. We find that this inter-rater disagreement is the most important factor, particularly when assessing lesion length. Although the PACSS uses an objective cut-off of 5 cm, it is these differences in the assessment of the target lesion and therefore lesion length that resulted in poor inter-rater agreement. Furthermore, the greater reliability of the Fanelli grades might also be attributed to a lack of areas where these discrepancies can occur, and where there are fewer relative measurements – whether length of calcium or the unilateral/bilateral appearance of calcium.
Strengths and limitations
One strength of this study is that it is the largest study to assess the inter-rater reliability of calcium scoring systems. It also uses non-invasive imaging for calcium scoring, which best reflects the use of these scoring systems in day-to-day clinical practice as well as research settings. The ICC is a better generalisable statistic to determine reliability of these scoring systems than Cohen’s kappa used in the study by Allan et al,5 as Cohen’s kappa is more suitable for nominal or binary assessments whereas the ICC is more suitable for analysing continuous values.21 In our case, VC grades are continuous as there is an established relationship that the higher the grade, the worse the severity of VC. The use of multiple raters also adds greater value to assessing the reliability of the calcium scoring systems.
One limitation of this study is that it does not point out which scoring system is ‘reliable’ as we did not provide a ‘gold-standard’ measurement with which to compare these scoring systems. However, in a study validating PARC and PACSS grading on catheter angiograms with IVUS, Yin et al found that IVUS was more sensitive to the extent of circumferential calcium as well as the length of calcium.19 In the absence of an agreed measurement standard using non-invasive imaging, this raises further questions concerning in what areas these scoring systems may be flawed and what characteristics of a lesion’s morphology should be considered to have a significant relation to clinical outcomes.
Conclusion
Inter-rater agreement and reliability of contemporary calcium scoring systems are poor in the assessment of severe VC. There is a lack of reproducibility of calcium scoring systems and universal measurements of VC between them. More literature is required to establish the link between predictable quantification of VC severity and clinical outcomes to best optimise outcomes in patients with PAD.

Article DOI:
Journal Reference:
J.Vasc.Soc.G.B.Irel. 2026;Online head of publication
Publication date:
June 25, 2026
Author Affiliations:
1. Hull York Medical School, University of York, Heslington, UK
2. Department of Radiology, Hull University Teaching Hospitals NHS Trust, Hull, UK
Corresponding author:
Dr Raghuram Lakshminarayan
Department of Radiology, Hull University Teaching Hospitals NHS Trust,
Hull, HU3 2JZ, UK
Email: [email protected]