Confluence Datasource
This module contains functionality related to the Confluence
datasource.
Client
ConfluenceClientFactory
Bases: SingletonFactory
Factory for creating and managing Confluence client instances.
This factory ensures only one Confluence client is created per configuration, following the singleton pattern provided by the parent SingletonFactory class.
Source code in src/extraction/datasources/confluence/client.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
Configuration
ConfluenceDatasourceConfiguration
Bases: DatasourceConfiguration
Source code in src/extraction/datasources/confluence/configuration.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
base_url
property
Constructs the complete base URL for the Confluence API from the protocol and host.
Returns: |
|
---|
Document
ConfluenceDocument
Bases: BaseDocument
Document representation for Confluence page content.
Extends BaseDocument to handle Confluence-specific document processing including content extraction, metadata handling, and exclusion configuration.
Source code in src/extraction/datasources/confluence/document.py
4 5 6 7 8 9 10 11 |
|
Manager
ConfluenceDatasourceManagerFactory
Bases: Factory
Factory for creating Confluence datasource managers.
This factory generates managers that handle the extraction of content from Confluence instances. It ensures proper configuration, reading, and parsing of Confluence content.
Attributes: |
|
---|
Source code in src/extraction/datasources/confluence/manager.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
Parser
ConfluenceDatasourceParser
Bases: BaseParser[ConfluenceDocument]
Source code in src/extraction/datasources/confluence/parser.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
__init__(configuration, parser=MarkItDown())
Initialize the Confluence parser with the provided configuration.
Parameters: |
|
---|
Source code in src/extraction/datasources/confluence/parser.py
16 17 18 19 20 21 22 23 24 25 26 27 |
|
parse(page)
Parse a Confluence page into a document.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/extraction/datasources/confluence/parser.py
29 30 31 32 33 34 35 36 37 38 39 40 |
|
ConfluenceDatasourceParserFactory
Bases: Factory
Source code in src/extraction/datasources/confluence/parser.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
Reader
ConfluenceDatasourceReader
Bases: BaseReader
Reader for extracting documents from Confluence spaces.
Implements document extraction from Confluence spaces, handling pagination and export limits.
Source code in src/extraction/datasources/confluence/reader.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
__init__(configuration, client, logger=LoggerConfiguration.get_logger(__name__))
Initialize the Confluence reader.
Parameters: |
|
---|
Source code in src/extraction/datasources/confluence/reader.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
read_all_async()
async
Asynchronously fetch all documents from Confluence.
Retrieves pages from all global spaces in Confluence, respecting the export limit. Yields each page as a dictionary containing its content and metadata.
Returns: |
|
---|
Source code in src/extraction/datasources/confluence/reader.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
ConfluenceDatasourceReaderFactory
Bases: Factory
Factory for creating Confluence reader instances.
Creates and configures ConfluenceDatasourceReader objects with appropriate clients based on the provided configuration.
Source code in src/extraction/datasources/confluence/reader.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
|