Datasources

This module contains functionality related to the the datasources module for extraction.bootstrap.configuration.

Datasources

DatasourceConfiguration

Bases: BaseConfigurationWithSecrets, ABC

Abstract base class for all data source configurations.

This class serves as the foundation for specific data source implementations, providing common configuration parameters. All concrete datasource configurations should inherit from this class.

Source code in src/extraction/bootstrap/configuration/datasources.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class DatasourceConfiguration(BaseConfigurationWithSecrets, ABC):
    """
    Abstract base class for all data source configurations.

    This class serves as the foundation for specific data source implementations,
    providing common configuration parameters. All concrete datasource
    configurations should inherit from this class.
    """

    name: DatasourceName = Field(
        ..., description="The name of the data source."
    )
    export_limit: Optional[int] = Field(
        None, description="The export limit for the data source."
    )

DatasourceConfigurationRegistry

Bases: ConfigurationRegistry

Registry for datasource configurations.

This registry manages all available datasource configurations and provides methods to access them based on their type.

Source code in src/extraction/bootstrap/configuration/datasources.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class DatasourceConfigurationRegistry(ConfigurationRegistry):
    """
    Registry for datasource configurations.

    This registry manages all available datasource configurations and provides
    methods to access them based on their type.
    """

    _key_class = DatasourceName

    @classmethod
    def get_union_type(self) -> List[DatasourceConfiguration]:
        """
        Returns the union type of all available datasources.

        This method provides a type hint representing a list of all possible
        datasource configurations, which can be used for validation or
        type checking when working with collections of datasources.

        Returns:
            List[DatasourceConfiguration]: A type representing a list of all
            registered datasource configurations.
        """
        return List[super().get_union_type()]

get_union_type() classmethod

Returns the union type of all available datasources.

This method provides a type hint representing a list of all possible datasource configurations, which can be used for validation or type checking when working with collections of datasources.

Returns:
Source code in src/extraction/bootstrap/configuration/datasources.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
@classmethod
def get_union_type(self) -> List[DatasourceConfiguration]:
    """
    Returns the union type of all available datasources.

    This method provides a type hint representing a list of all possible
    datasource configurations, which can be used for validation or
    type checking when working with collections of datasources.

    Returns:
        List[DatasourceConfiguration]: A type representing a list of all
        registered datasource configurations.
    """
    return List[super().get_union_type()]

DatasourceName

Bases: str, Enum

List of all available datasources.

Defines the supported data sources that can be used for extraction: - CONFLUENCE: Atlassian Confluence wiki pages and spaces - NOTION: Notion databases, pages, and blocks - PDF: PDF documents from file system or URLs

Source code in src/extraction/bootstrap/configuration/datasources.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class DatasourceName(str, Enum):
    """
    List of all available datasources.

    Defines the supported data sources that can be used for extraction:
    - CONFLUENCE: Atlassian Confluence wiki pages and spaces
    - NOTION: Notion databases, pages, and blocks
    - PDF: PDF documents from file system or URLs
    """

    CONFLUENCE = "confluence"
    NOTION = "notion"
    PDF = "pdf"
    BUNDESTAG = "bundestag"