API

Overview

Pygount provides a simple API to integrate in other tools. This however is currently still a work in progress and subject to change.

Here’s an example on how to analyze one of pygount’s own source codes:

>>> from pygount import SourceAnalysis
>>> SourceAnalysis.from_file("pygount/analysis.py", "pygount")
SourceAnalysis(path='pygount/analysis.py', language='Python', group='pygount', state=analyzed, code_count=509, documentation_count=141, empty_count=117, string_count=23)

Information about multiple source files can be summarize using ProjectSummary:

First, set up the summary:

>>> from pygount import ProjectSummary
>>> project_summary = ProjectSummary()

Next, find some files to analyze:

>>> from glob import glob
>>> source_paths = glob("pygount/*.py") + glob("*.md")
>>> source_paths
['pygount/command.py', 'pygount/analysis.py', 'pygount/write.py', 'pygount/__init__.py', 'pygount/xmldialect.py', 'pygount/summary.py', 'pygount/common.py', 'pygount/lexers.py', 'README.md', 'CONTRIBUTING.md', 'CHANGES.md']

Then analyze them:

>>> for source_path in source_paths:
...     source_analysis = SourceAnalysis.from_file(source_path, "pygount")
...     project_summary.add(source_analysis)

Finally, take a look at the information collected, for example by printing the values of ProjectSummary.language_to_language_summary_map

>>> for language_summary in project_summary.language_to_language_summary_map.values():
...   print(language_summary)
...
LanguageSummary(language='Python', file_count=8, code=1232, documentation=295, empty=331, string=84)
LanguageSummary(language='markdown', file_count=3, code=64, documentation=0, empty=29, string=14)

Reference

Pygount counts lines of source code using pygments lexers.

class pygount.DuplicatePool

A pool that collects information about potential duplicate files.

duplicate_path(source_path: str) Optional[str]

Path to a duplicate for source_path or None if no duplicate exists.

Internally information is stored to identify possible future duplicates of source_path.

exception pygount.Error

Error to indicate that something went wrong during a pygount run.

class pygount.LanguageSummary(language: str)

Summary of a source code counts from multiple files of the same language.

add(source_analysis: pygount.analysis.SourceAnalysis) None

Add counts from source_analysis to total counts for this language.

property code_count: int

sum lines of code for this language

property documentation_count: int

sum lines of documentation for this language

property empty_count: int

sum empty lines for this language

property file_count: int

number of source code files for this language

property is_pseudo_language: bool

True if the language is not a real programming language

property language: str

the language to be summarized

sort_key() Hashable

sort key to sort multiple languages by importance

property source_count: int

sum number of source lines of code

property string_count: int

sum number of lines containing only strings for this language

exception pygount.OptionError(message, source=None)

Error to indicate that a value passed to a command line option must be fixed.

class pygount.ProjectSummary

Summary of source code counts for several languages and files.

add(source_analysis: pygount.analysis.SourceAnalysis) None

Add counts from source_analysis to total counts.

property language_to_language_summary_map: Dict[str, pygount.summary.LanguageSummary]

A map containing summarized counts for each language added with add() so far.

class pygount.SourceAnalysis(path: str, language: str, group: str, code: int, documentation: int, empty: int, string: int, state: pygount.analysis.SourceState, state_info: Optional[str] = None)

Results from analyzing a source path.

Prefer the factory methods from_file() and from_state() to calling the constructor.

property code_count: int

number of lines containing code

property documentation_count: int

number of lines containing documentation (resp. comments)

property empty_count: int

number of empty lines, including lines containing only white space, white characters or white code words

See also: white_characters(), white_code_words()

static from_file(source_path: str, group: str, encoding: str = 'automatic', fallback_encoding: str = 'cp1252', generated_regexes=[re.compile('(?i).*automatically generated', re.IGNORECASE), re.compile('(?i).*do not edit', re.IGNORECASE), re.compile('(?i).*generated with the .+ utility', re.IGNORECASE), re.compile('(?i).*this is a generated file', re.IGNORECASE), re.compile('(?i).*generated automatically', re.IGNORECASE)], duplicate_pool: Optional[pygount.analysis.DuplicatePool] = None) pygount.analysis.SourceAnalysis

Factory method to create a SourceAnalysis by analyzing the source code in source_path.

Parameters
  • source_path – path to source code to analyze

  • group – name of a logical group the sourc code belongs to, e.g. a package.

  • encoding – encoding according to encoding_for()

  • fallback_encoding – fallback encoding according to encoding_for()

  • generated_regexes – list of regular expression that if found within the first few lines if a source code identify is as generated source code for which SLOC should not be counted

  • duplicate_pool – a DuplicatePool where information about possible duplicates is collected, or None if possible duplicates should be counted multiple times.

static from_state(source_path: str, group: str, state: pygount.analysis.SourceState, state_info: Optional[str] = None) pygount.analysis.SourceAnalysis

Factory method to create a SourceAnalysis with all counts set to 0 and everything else according to the specified parameters.

property group: str

Group the source code belongs to; this can be any text useful to group the files later on. It is perfectly valid to put all files in the same group.

(Note: this property is mostly there for compatibility with the original SLOCCount.)

property is_countable: bool

True if source counts can be counted towards a total.

property language: str

The programming language the analyzed source code is written in; if state does not equal SourceState.analyzed this will be a pseudo language.

property source_count: int

number of source lines of code (the sum of code_count and string_count)

property state: pygount.analysis.SourceState

The state of the analysis after parsing the source file.

property state_info: Optional[Union[str, Exception]]

Possible additional information about state:

property string_count: int

number of lines containing only strings but no other code

class pygount.SourceScanner(source_patterns, suffixes='*', folders_to_skip=[re.compile('(?s:\\...*)\\Z'), re.compile('(?s:_svn)\\Z'), re.compile('(?s:__pycache__)\\Z')], name_to_skip=[re.compile('(?s:\\..*)\\Z'), re.compile('(?s:.*\\~)\\Z')])

Scanner for source code files matching certain conditions.

source_paths() Generator[str, None, None]

Paths to source code files matching all the conditions for this scanner.

class pygount.SourceState(value)

Possible values for SourceAnalysis.state.

analyzed = 1

successfully analyzed

binary = 2

source code is a binary

duplicate = 3

source code is an identical copy of another

empty = 4

source code is empty (file size = 0)

error = 5

source could not be parsed

generated = 6

source code has been generated

unknown = 7

pygments does not offer any lexer to analyze the source

pygount.encoding_for(source_path: str, encoding: str = 'automatic', fallback_encoding: Optional[str] = None) str

The encoding used by the text file stored in source_path.

The algorithm used is:

  • If encoding is 'automatic, attempt the following:

    1. Check BOM for UTF-8, UTF-16 and UTF-32.

    2. Look for XML prolog or magic heading like # -*- coding: cp1252 -*-

    3. Read the file using UTF-8.

    4. If all this fails, use the fallback_encoding and ignore any further encoding errors.

  • If encoding is 'chardet use chardet to obtain the encoding.

  • For any other encoding simply use the specified value.