-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Introduce the chat history reducer #10190
base: main
Are you sure you want to change the base?
Conversation
Python Test Coverage Report •
Python Unit Test Overview
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of questions raised, most importantly why this isn't part of the whole thing instead of part of agents!
python/semantic_kernel/agents/history/chat_history_truncation_reducer.py
Outdated
Show resolved
Hide resolved
python/semantic_kernel/agents/history/chat_history_summarization_reducer.py
Outdated
Show resolved
Hide resolved
python/semantic_kernel/agents/history/chat_history_reducer_extensions.py
Outdated
Show resolved
Hide resolved
python/samples/concepts/agents/chat_completion_history_reducer.py
Outdated
Show resolved
Hide resolved
python/semantic_kernel/agents/history/chat_history_summarization_reducer.py
Outdated
Show resolved
Hide resolved
…ndencies present in code.
chat_history.messages.append(response) | ||
print(f"# {response.role} - {response.name}: '{response.content}'") | ||
|
||
index += 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add 2 instead of 1? Is it because the message_count
is the sum of user messages and assitant messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sample, the user inputs a number, the model responds in the next number (in Spanish, per the prompt), then we skip to the next...
user: 1
model: dos
user: 3
model: cuatro
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. A comment here will be nice. Or the expected output of the sample.
|
||
# If history was reduced, print summaries | ||
if is_reduced: | ||
self._print_summaries_from_front(chat_history.messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the reducer is an instance of ChatHistoryTruncationReducer
, this would not print anything looks like,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _print_summaries_from_xxx
looks for the __summary__
attribute on the metadata of the messages. When it is the ChatHistoryTruncationReducer
, it doesn't place the attribute in the metadata, so the function won't print anything.
python/samples/concepts/chat_completion/simple_chatbot_with_truncation_history_reducer.py
Show resolved
Hide resolved
|
||
@abstractmethod | ||
async def next(self, agents: list["Agent"], history: list["ChatMessageContent"]) -> "Agent": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change. Maybe we should deprecate it first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agent framework is experimental, breaking changes can occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also to align with .Net, which made the change months ago. We never did, so it's best to align now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TaoChenOSU, btw, why do you consider this a breaking change? The agent group chat still calls into next
which exists as a concrete method in that base class. From there, it calls an overridden select_agent
method to select the agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone has a subclass off of this for a custom selection strategy, then their implementation will break.
python/semantic_kernel/agents/strategies/selection/sequential_selection_strategy.py
Outdated
Show resolved
Hide resolved
chat_history = ChatHistory(messages=messages) | ||
chat_history.add_system_message(self.summarization_instructions) | ||
|
||
settings = self.service.get_prompt_execution_settings_class()(service_id=self.service_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why is the service_id
required in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to tie the execution settings to an AI service that was registered on the kernel. Passing in the service_id, which will either be DEFAULT or a specified value allows us to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're calling get_chat_message_content
on the service directly, is there a need to specify the service id? the kernel doesn't play any role here.
python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py
Show resolved
Hide resolved
message_index = total_count - target_count | ||
|
||
# Move backward to avoid cutting through function call/results | ||
while message_index >= offset_count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to handle cases where the offset_count
lands between a function call content and a function result content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the purpose of this loop so that we don't separate the two content types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, what I meant was cases where offset_count
sits between a function call and a function result.
Say this is the history older to newer:
0: system message, 1: user message, 2: assistant message with function call, 3: assistant message with function result, ...
when offset_count
is 3, the minimum value of message_index
is also 3, which will cut right between the two assistant messages.
older_range_start = 0 if self.use_single_summary else insertion_point | ||
older_range_end = truncation_index | ||
|
||
messages_to_summarize = extract_range( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:
Say my chat history currently has 11 messages, and my target count, threshold count, and offset count are 5, 5, and 1, respectively.
The insertion point will be 11, and the truncation index will be 6 assuming there is no function call and user message in the threshold window. Then the range will be [11, 6), which will create an empty list of messages, while we are trying to summarize messages in [1, 6]. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This edge case could occur if all messages in the history are marked with the SUMMARY_METADATA_KEY
. This is the logic that exists in .Net. Let me look at improving it in Python. I see two ways that could work:
We mark only the newest summary method:
# For each message in the history except the newly created summary:
for i, msg in enumerate(self.messages[:-1]):
if SUMMARY_METADATA_KEY in msg.metadata:
del msg.metadata[SUMMARY_METADATA_KEY]
We can add a safe-guard before we summarize:
insertion_point = locate_summarization_boundary(history)
if insertion_point == len(history):
# fallback fix: force boundary to something reasonable
insertion_point = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this more and I'm not super satisfied with these options. Since this appears to be an unlikely edge case, I do want to move forward with this summarizer and later iterate on it to improve the scenario. I have some thoughts but we should discuss them outside of this PR as improvements. I will create a work item to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is what I described an edge case? Seems like it going to be common in agent group chat where all the messages in the threshold window are assistant messages without function content.
Do we have test cases to verify?
python/semantic_kernel/contents/history_reducer/chat_history_truncation_reducer.py
Show resolved
Hide resolved
logger.info("Performing chat history summarization check...") | ||
|
||
# 1. Identify where existing summary messages end | ||
insertion_point = locate_summarization_boundary(history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we always skip a SYSTEM/DEVELOPER message if it's the first one?
python/semantic_kernel/contents/history_reducer/chat_history_reducer_utils.py
Show resolved
Hide resolved
python/semantic_kernel/contents/history_reducer/chat_history_reducer_utils.py
Show resolved
Hide resolved
python/semantic_kernel/contents/history_reducer/chat_history_reducer_utils.py
Show resolved
Hide resolved
self.threshold_count, | ||
offset_count=insertion_point, | ||
) | ||
if truncation_index < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if truncation_index < 0: | |
if truncation_index is None: |
python/semantic_kernel/contents/history_reducer/chat_history_summarization_reducer.py
Show resolved
Hide resolved
async def _summarize(self, messages: list[ChatMessageContent]) -> ChatMessageContent | None: | ||
"""Use the ChatCompletion service to generate a single summary message.""" | ||
chat_history = ChatHistory(messages=messages) | ||
chat_history.add_system_message(self.summarization_instructions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given o1 not having System, should this be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Let me see.
older_range_start = 0 if self.use_single_summary else insertion_point | ||
older_range_end = truncation_index | ||
|
||
messages_to_summarize = extract_range( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question here is, why are function call and its output not used for summaries? They might contain relevant background that was not used in the directly following response, but it might have been used in subsequent responses for which it is then missing context? (think multiple sequential FCC's and the summary landing in the middle)
Motivation and Context
The SK Python framework has been missing the ability to configure a chat history reducer of type
ChatHistoryTruncationReducer
andChatHistorySummarizationReducer
which have existed in the .Net SK Agent framework for some time.The goal of this PR is to introduce the chat history reducers and allow them for use for not only the agent framework, but also anything else that uses a chat history (chat completion, for example). The ChatHistoryReducer extends the ChatHistory class, and so it's simple to include a reducer and logic to reduce messages as one manages the chat history either in an agent framework setting or in a chat completion setting.
Description
This PR:
ChatHistoryTruncationReducer
andChatHistorySummarizationReducer
.Chat Completion History Reducer
to show how to configure both reducers and what each parameter does.select_agent
abstract method so that one can define an initial agent to run in a particular scenario.FunctionCallBehavior
class, and removes some nasty circular dependencies that we had lurking in the code base for some time. ThisFunctionCallBehavior
has been marked with a deprecation warning for 6+ months now. All samples and docs have moved over to useFunctionChoiceBehavior
- developers usingFunctionCallBehavior
should have had enough time to switch.Contribution Checklist