Confluence Datasource
This module contains functionality related to the Confluence
datasource.
Cleaner
ConfluenceCleaner
Bases: BaseCleaner
The ConfluenceCleaner
class is a concrete implementation of BaseCleaner
for cleaning Confluence documents.
Source code in src/embedding/datasources/confluence/cleaner.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
_get_documents_with_tqdm(documents)
staticmethod
Return the documents with tqdm progress bar if GlobalSettings.SHOW_PROGRESS is True, else return the documents as is.
:param documents: List of Notion document objects
Source code in src/embedding/datasources/confluence/cleaner.py
31 32 33 34 35 36 37 38 |
|
clean(documents)
Clean the list of Confluence documents. If the content is empty it is not added to the cleaned documents.
:param documents: List of ConfluenceDocument objects :return: List of cleaned ConfluenceDocument objects
Source code in src/embedding/datasources/confluence/cleaner.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Document
ConfluenceDocument
Bases: BaseDocument
Document representation for Confluence page content.
Extends BaseDocument to handle Confluence-specific document processing including content extraction, metadata handling, and exclusion configuration.
Attributes: |
|
---|
Note
Handles conversion of HTML content to markdown and manages metadata filtering for both embedding and LLM contexts.
Source code in src/embedding/datasources/confluence/document.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
_get_metadata(page, base_url)
staticmethod
Extract and format page metadata.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/embedding/datasources/confluence/document.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
_set_excluded_embed_metadata_keys()
Configure metadata keys to exclude from embeddings.
Identifies metadata keys not explicitly included in embedding processing and marks them for exclusion.
Source code in src/embedding/datasources/confluence/document.py
44 45 46 47 48 49 50 51 52 53 54 55 |
|
_set_excluded_llm_metadata_keys()
Configure metadata keys to exclude from LLM context.
Identifies metadata keys not explicitly included in LLM processing and marks them for exclusion.
Source code in src/embedding/datasources/confluence/document.py
57 58 59 60 61 62 63 64 65 66 67 68 |
|
from_page(page, base_url)
classmethod
Create ConfluenceDocument instance from page data.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/embedding/datasources/confluence/document.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
Manager
ConfluenceDatasourceManager
Bases: DatasourceManager
Manager for Confluence content extraction and processing.
Handles document retrieval, cleaning, splitting and embedding updates for Confluence workspace content. Implements the base DatasourceManager interface for Confluence-specific processing.
Source code in src/embedding/datasources/confluence/manager.py
4 5 6 7 8 9 10 11 12 |
|
Reader
ConfluenceReader
Bases: BaseReader
Reader for extracting documents from Confluence spaces.
Implements document extraction from Confluence spaces, handling pagination and export limits. Supports both synchronous and asynchronous retrieval.
Attributes: |
|
---|
Source code in src/embedding/datasources/confluence/reader.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
__init__(configuration, confluence_client)
Initialize the Confluence reader.
Parameters: |
|
---|
Source code in src/embedding/datasources/confluence/reader.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
_get_all_pages(space, limit)
Fetch all pages from a Confluence space.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/embedding/datasources/confluence/reader.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
_limit_reached(pages, limit)
staticmethod
Check if page limit has been reached.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/embedding/datasources/confluence/reader.py
131 132 133 134 135 136 137 138 139 140 141 142 |
|
get_all_documents()
Synchronously fetch all documents from Confluence.
Returns: |
|
---|
Note
Not implemented - use get_all_documents_async instead.
Source code in src/embedding/datasources/confluence/reader.py
41 42 43 44 45 46 47 48 49 50 |
|
get_all_documents_async()
async
Asynchronously fetch all documents from Confluence.
Retrieves documents from all global spaces, respecting export limit.
Returns: |
|
---|
Source code in src/embedding/datasources/confluence/reader.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
Splitter
ConfluenceSplitter
Bases: BaseSplitter
Source code in src/embedding/datasources/confluence/splitter.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
__init__(markdown_splitter)
The ConfluenceSplitter
class is a concrete class that defines the interface for splitting documents into text nodes.
:param markdown_splitter: MarkdownSplitter object for splitting documents
Source code in src/embedding/datasources/confluence/splitter.py
14 15 16 17 18 19 20 21 22 23 |
|
split(documents)
Split the given list of documents into text nodes using markdown_splitter
. Documents should be in markdown format.
:param documents: List of Document objects :return: List of TextNode objects
Source code in src/embedding/datasources/confluence/splitter.py
25 26 27 28 29 30 31 32 |
|
Builders
ConfluenceCleanerBuilder
Builder for creating Confluence content cleaner instances.
Provides factory method to create Cleaner objects for Confluence content.
Source code in src/embedding/datasources/confluence/builders.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
build()
staticmethod
Creates a content cleaner for Confluence.
Returns: |
|
---|
Source code in src/embedding/datasources/confluence/builders.py
108 109 110 111 112 113 114 115 116 |
|
ConfluenceClientBuilder
Builder for creating Confluence API client instances.
Provides factory method to create configured Confluence API clients.
Source code in src/embedding/datasources/confluence/builders.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
build(configuration)
staticmethod
Creates a configured Confluence API client.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/embedding/datasources/confluence/builders.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
ConfluenceDatasourceManagerBuilder
Builder for creating Confluence datasource manager instances.
Provides factory method to create configured ConfluenceDatasourceManager with required components for content processing.
Source code in src/embedding/datasources/confluence/builders.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
build(configuration, reader, cleaner, splitter)
staticmethod
Creates a configured Confluence datasource manager.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/embedding/datasources/confluence/builders.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
ConfluenceReaderBuilder
Builder for creating Confluence reader instances.
Provides factory method to create configured ConfluenceReader objects.
Source code in src/embedding/datasources/confluence/builders.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
build(configuration, confluence_client)
staticmethod
Creates a configured Confluence reader.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/embedding/datasources/confluence/builders.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
ConfluenceSplitterBuilder
Source code in src/embedding/datasources/confluence/builders.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
build(markdown_splitter)
staticmethod
Builds a ConfluenceSplitter
instance using MarkdownSplitter
.
:param markdown_splitter: MarkdownSplitter object :return: ConfluenceSplitter object
Source code in src/embedding/datasources/confluence/builders.py
121 122 123 124 125 126 127 128 129 130 131 132 |
|