We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create_list_without_duplicates
The current implementation of create_list_without_duplicates includes a relatively expensive check of if an element is in a list.
tools-python/src/spdx_tools/spdx/document_utils.py
Lines 51 to 57 in 8050fd9
Using a set to keep track of unique elements will speed up this function.
I did a time comparison between this current function and a function that uses a set.
from typing import List, Any import timeit # Original implementation def create_list_without_duplicates_original(list_with_potential_duplicates: List[Any]) -> List[Any]: list_without_duplicates = [] for element in list_with_potential_duplicates: if element not in list_without_duplicates: list_without_duplicates.append(element) return list_without_duplicates # Optimized implementation def create_list_without_duplicates_optimized(list_with_potential_duplicates: List[Any]) -> List[Any]: seen_elements = set() list_without_duplicates = [] for element in list_with_potential_duplicates: if element not in seen_elements: seen_elements.add(element) list_without_duplicates.append(element) return list_without_duplicates # Test data test_data = list(range(1000)) * 10 # Measure time for the original implementation original_time = timeit.timeit(lambda: create_list_without_duplicates_original(test_data), number=1000) # Measure time for the optimized implementation optimized_time = timeit.timeit(lambda: create_list_without_duplicates_optimized(test_data), number=1000) print(f"Original implementation time: {original_time:.1f} seconds") print(f"Optimized implementation time: {optimized_time:.1f} seconds")
The results are:
Original implementation time: 35.6seconds Optimized implementation time: 0.2seconds
That original function is about 178x slower. I'll put in a PR for discussion.
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
The current implementation of
create_list_without_duplicates
includes a relatively expensive check of if an element is in a list.tools-python/src/spdx_tools/spdx/document_utils.py
Lines 51 to 57 in 8050fd9
Using a set to keep track of unique elements will speed up this function.
I did a time comparison between this current function and a function that uses a set.
The results are:
That original function is about 178x slower. I'll put in a PR for discussion.
The text was updated successfully, but these errors were encountered: