Usage¶
General¶
Run and specify the folder to analyze recursively, for example:
$ pygount ~/development/sometool
If you omit the folder, the current folder of your shell is used as starting
point. Apart from folders you can also specify single files and shell patterns
(using ?
, *
and ranges like [a-z]
).
Certain files and folders are automatically excluded from the analysis:
files starting with dot (
.
) or ending in tilda (~
)folders starting with dot (
.
) or named_svn
.
- --folders-to-skip LIST¶
- --names-to-skip LIST¶
To specify alternative patterns, use --folders-to-skip
and
--names-to-skip
. Both take a comma separated list of patterns, see below
on the pattern syntax. To for example also prevent folders starting with two
underscores (_
) from being analyzed, specify
--folders-to-skip=[...],__*
.
- --suffix LIST¶
To limit the analysis on certain file types, you can specify a comma separated
list of suffixes to take into account, for example --suffix=py,sql,xml
.
- --out FILE¶
By default the result of the analysis are written to the standard output. To
redirect the output to a file, use for example --out=counts.txt
.
To explicitly redirect to the standard output specify --out=STDOUT
.
- --format FORMAT¶
By default the result of the analysis are written to the standard output in a
format similar to sloccount. To redirect the output to a file, use e.g.
--out=counts.txt
. To change the format to an XML file similar to cloc, use
--format=cloc-xml
.
To just get a quick grasp of the languages used in a project and their
respective importance use --format=summary
which provides a language
overview and a sum total. For example pygount’s summary looks like this:
Language Files % Code % Comment %
---------------- ----- ------ ---- ------ ------- ------
Python 19 51.35 1924 72.99 322 86.10
reStructuredText 7 18.92 332 12.59 7 1.87
markdown 3 8.11 327 12.41 1 0.27
Batchfile 1 2.70 24 0.91 1 0.27
YAML 1 2.70 11 0.42 2 0.53
Makefile 1 2.70 9 0.34 7 1.87
INI 1 2.70 5 0.19 0 0.00
TOML 1 2.70 4 0.15 0 0.00
Text 3 8.11 0 0.00 34 9.09
---------------- ----- ------ ---- ------ ------- ------
Sum total 37 2636 374
The summary output is designed for human readers and the column widths adjust to the data.
For further processing the results of pygount, --format=json
should be the
easiest to deal with. For more information see JSON.
Remote repositories¶
Additionally to local files, pygount can analyze remote git repositories:
$ pygount https://github.com/roskakori/pygount.git
In the background, this creates a shallow clone of the repository in a temporary folder that after the analysis is is removed automatically.
Therefore you need to have at read access to the repository.
If you want to analyze a specific revision, specify it at the end of the URL:
$ pygount https://github.com/roskakori/pygount.git/v1.6.0
The remote URL supports the git standard protocols: git, HTTP/S and SSH.
$ pygount git@github.com:username/project.git
You can specify multiple repositories, for example to include both the web application, command line client and docker container of the Weblate project:
$ pygount https://github.com/WeblateOrg/weblate.git https://github.com/WeblateOrg/wlc.git https://github.com/WeblateOrg/docker.git
And you can even mix local files and remote repositories:
$ pygount ~/projects/some https://github.com/roskakori/pygount.git
Patterns¶
Some command line arguments take patterns as values.
By default, patterns are shell patterns using *
, ?
and ranges like
[a-z]
as placeholders. Depending on your platform, the are case sensitive
(Unix) or not (Mac OS, Windows).
If a pattern starts with [regex]
you can specify a comma separated list
of regular expressions instead using all the constructs supported by the
Python regular expression syntax.
Regular expressions are case sensitive unless they include a (?i)
flag.
If the first actual pattern is [...]
, default patterns are included.
Without it, defaults are ignored and only the pattern explicitly stated are
taken into account.
- --generated¶
So for example to specify that generated code can also contain the German word
“generiert” in a case insensitive way use
--generated="[regex][...](?i).*generiert"
.
Counting duplicates¶
- --duplicates¶
By default pygount prevents multiple source files with exactly the same content to be counted again.
For two files to be considered duplicates the following conditions must be met:
Both files have the same size.
Both files have the same MD5 hashcode.
This allows for an efficient detection with a very small possibility for false positives.
However it also prevents detection of files with only minor differences as duplicates. Examples are files that are identical except for additional white space, empty lines or different line endings.
If you still want to count duplicates multiple times, specify
--duplicates
. This will also result in a minor performance gain of
the analysis.
Source code encoding¶
- --encoding ENCODING[;FALLBACK]¶
When reading source code, pygount automatically detects the encoding. It uses a simple algorithm where it recognizes BOM, XML declarations such as:
<?xml encoding='cp1252'?>
and “magic” comments such as:
# -*- coding: cp1252 -*-
If the file does not have an appropriate heading, pygount attempts to read it using UTF-8. If this fails, it reads the file using a fallback encoding (by default CP1252) and ignores any encoding errors.
You can change this behavior using the --encoding
option:
To keep the automatic analysis and use a different fallback encoding specify for example
--encoding=automatic;iso-8859-15
.To use an automatic detection based on heuristic, use
--encoding=chardet
. For this to work, the chardet package must be installed,To use a specific encoding (for all files analyzed), use for example
--encoding=iso-8859-15
.
Pseudo languages¶
If a source code is not counted, the number of lines is 0 and the language shown is a pseudo language indicating the reason:
__binary__
- used for Binary files.__duplicate__
- the source code duplicate as described at the command line option--duplicates
.__empty__
- the source code is an empty file with a size of 0 bytes.__error__
- the source code could not be parsed e.g. due to an I/O error.__generated__
- the source code is generated according to the command line option--generated
.__unknown__
- pygments does not provide a lexer to parse the source code.
Other information¶
- --verbose¶
If --verbose
is specified, pygount logs detailed information about
what it is doing.
- --help¶
To get a description of all the available command line options, run:
$ pygount --help
- --version¶
To get pygount’s current version number, run:
$ pygount --version