Logs should contain the importer type such as github, bitbucket, bitbucket_server. You can find a full list of import sources in Gitlab::ImportSources.
Logs should include any information likely to aid in debugging:
Object identifiers such as id, iid, and type of object
Error or status messages
Logs should not include sensitive or private information, including but not limited to:
Usernames
Email addresses
Where applicable, we should track the error in Gitlab::Import::ImportFailureService to aid in displaying errors in the UI.
Logging should raise an error in development if key identifiers are missing, as demonstrated in this MR.
A log line should be created before and after each record is imported, containing that record’s identifier.
Performance
A cache with a default TTL of 24 hours should be used to prevent duplicate database queries and API calls.
Workers that loop over collections should be equipped with a progress pointer that allows them to pick up where they left off if interrupted.
Write-heavy workers should implement defer_on_database_health_signal to avoid saturating the database. However, at the time of writing, a known issue prevents us from using this.
We should enforce limits on worker concurrency to avoid saturating resources. You can find an example of this in the Bitbucket ParallelScheduling class.
Importers should be tested at scale on a staging environment, especially when implementing new functionality or enabling a feature flag.
Resilience
Workers should be idempotent so they can be retried safely in the case of failure.
Workers should be re-enqueued with a delay that respects concurrent batch limits.
Individual workers should not run for a long time. Workers that run for a long time can be interrupted by Sidekiq due to a deploy, or be misidentified by StuckProjectImportJobsWorker as being part of an import that is stuck and should be failed.
If a worker must run for a long time it must refresh its JID using Gitlab::Import::RefreshImportJidWorker to avoid being terminated by StuckProjectImportJobsWorker. It may also need to raise its Sidekiq max_retries_after_interruption. Refer to the GitHub importer implementation.
Workers that rely on cached values must implement fall-back mechanisms to fetch data in the event of a cache miss.
Re-fetch data if possible and performant.
Gracefully handle missing values.
Long-running workers should be annotated with worker_resource_boundary :memory to place them on a shard with a two hour termination grace period. A long termination grace period is not a replacement for writing fast workers. Apdex SLO compliance can be monitored on the I&I team Grafana dashboard.
Workers that create data should not fail an entire import if a single record fails to import. They must log the appropriate error and make a decision on whether or not to retry based on the nature of the error.
Import Stage workers (which include StageMethods) and Advance Stage workers (which include Gitlab::Import::AdvanceStage) should have retries: 6 to make them more resilient to system interruptions. With exponential back-off, six retries spans approximately 20 minutes. Any higher retry holds up an import for too long.
It should be possible to retry a portion of an import, for example re-importing missing issues without overwriting the entire destination project.
Consistency
Importers should fire callbacks after saving records. Problematic callbacks can be disabled for imports on an individual basis:
When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.
Cookie Policy
User ID: 185b355c-f884-4864-96a0-8c7f0981e2a5
This User ID will be used as a unique identifier while storing and accessing your preferences for future.
Timestamp: --
Strictly Necessary Cookies
Always Active
These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, enabling you to securely log into the site, filling in forms, or using the customer checkout. GitLab processes any personal data collected through these cookies on the basis of our legitimate interest.
Functionality Cookies
These cookies enable helpful but non-essential website functions that improve your website experience. By recognizing you when you return to our website, they may, for example, allow us to personalize our content for you or remember your preferences. If you do not allow these cookies then some or all of these services may not function properly. GitLab processes any personal data collected through these cookies on the basis of your consent
Performance and Analytics Cookies
These cookies allow us and our third-party service providers to recognize and count the number of visitors on our websites and to see how visitors move around our websites when they are using it. This helps us improve our products and ensures that users can easily find what they need on our websites. These cookies usually generate aggregate statistics that are not associated with an individual. To the extent any personal data is collected through these cookies, GitLab processes that data on the basis of your consent.
Targeting and Advertising Cookies
These cookies enable different advertising related functions. They may allow us to record information about your visit to our websites, such as pages visited, links followed, and videos viewed so we can make our websites and the advertising displayed on it more relevant to your interests. They may be set through our website by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other websites. GitLab processes any personal data collected through these cookies on the basis of your consent.