Pages

A blog about teaching Programming to non-CompSci students by Tim Love (Cambridge University Engineering Department). I do not speak on behalf of the university, the department, or even the IT group I belong to.

Monday 30 March 2015

Using Software Metrics in Automated Assessment

In an attempt to make assessment of code less subjective I've tried using cccc to gather software metrics for a year's worth (about 40 programs) of interdisciplinary projects, hoping that some of the values will correspond to human-assessed Software quality or the general performance. The task is quite tightly constrained, so I was hoping that differences between metrics might be significant, though the task involved electronics and mechanics too, and students could choose whether to use O-O or not.

The cccc program measures simple features like lines of code, lines of comments, etc., but also it measures "information flow between modules" and "decision complexity". It's easy to use. The only problem in practice is that it needs to be given all and only the code used to produce the students' program. Some students' folders were very messy, and their version control was little more than commenting/uncommenting blocks of code, or saving old files with names like "nearlyready.cc". Some teams had main programs that incorporated calibration and testing code whilst others wrote separate programs to perform these tasks. I tried to data-cleanse before running cccc, but the project was too open-ended to make comparisons fair.

I can't see any correlations useful to us, though there's a great variety of code statistics (the code/comment ratio ranging from 1.3 to 59 for example). As a rule of thumb (and unsurprisingly) it seems that teams who use more than 1 source file tend to get decent marks. Those with a few physical source files but only one logical file (e.g. a main.cc with

#include "linefollow.cc"
#include "actuator.cc"

etc.) tended to fare poorly. Here's some sample cccc output along with the human marks.

Feature
Num of modules (NOM) 114143051
Lines of Code (LOC) 5656229173017948498291291
McCabe's Cyclomatic Number (MVG) /NOM8078597331120132212
Lines of Comment (COM) 13222316583130814320
LOC/COM 4.282.7810.42.152.7594.03
MVG/COM 0.60.353.60.390.390.66
Information Flow measure/NOM 00002200
Software Mark (human)5653 7167756662
General Performance (human)6550 8065756060

What CCCC measures

CCCC creates web-page reports. It measures

  • Number of modules NOM
  • Lines of Code LOC
  • McCabe's Cyclomatic Number MVG
  • Lines of Comment COM
  • LOC/COM L_C
  • MVG/COM M_C
  • Information Flow measure ( inclusive ) IF4
  • Information Flow measure ( visible ) IF4v
  • Information Flow measure ( concrete ) IF4c
  • Lines of Code rejected by parser
  • Weighted Methods per Class ( weighting = unity ) WMC1
  • Weighted Methods per Class ( weighting = visible ) WMCv
  • Depth of Inheritance Tree DIT
  • Number of Children NOC (Moderate values of this measure indicate scope for reuse, however high values may indicate an inappropriate abstraction in the design)
  • Coupling between objects CBO (The number of other modules which are coupled to the current module either as a client or a supplier. Excessive coupling indicates weakness of module encapsulation and may inhibit reuse)
  • Information Flow measure ( inclusive )
  • Fan-in FI (The number of other modules which pass information into the current module)
  • Fan-out FO (The number of other modules into which the current module passes information)
  • Information Flow measure IF4 (A composite measure of structural complexity, calculated as the square of the product of the fan-in and fan-out of a single module)

McCabe's Cyclomatic Complexity is a measure of the decision complexity of the functions. Information Flow measure is a measure of information flow between modules

See Also