Two years back, we shared about the EmailMessage object here, this blog will share the relationship between how the email is stored in the EmailMessage with email-to-case.
This blog is written for email header-based threading, not the older string-based threading, check out the difference here.
Email Header
There are 3 main pieces of information from the email header that determine email header-based threading.
- Message-ID: this is a unique Id from each email sent.
- In-Reply-To: when someone reply, reply all, or forward email, this will be the Message-ID of the email replied or forwarded -- this will be not available for a brand new email
- References: this will contain ALL Message-ID in the past related to the email, because of the the email is replied or forwarded -- this will be not available for a brand new email
Email header example in Outlook:
Message-ID:
 <SJ0PR04MB82591BAF72F77B70A462298BA1139@SJ0PR04MB8259.namprd04.prod.outlook.com>
In-Reply-To:
 <BYAPR04MB4423787D7E4151F21F0F7A0AAF109@BYAPR04MB4423.namprd04.prod.outlook.com>
References:
 <BN8PR04MB58573540EF22FB7F96C770C1BA0D9@BN8PR04MB5857.namprd04.prod.outlook.com>
 <BN8PR04MB5857B720DAEE1DD7536615ADBA0C9@BN8PR04MB5857.namprd04.prod.outlook.com>
 <DM8PR04MB8007D5BDCFA47E407444341FA50F9@DM8PR04MB8007.namprd04.prod.outlook.com>
 <DM6PR04MB64935C0BAD0CE8B45F2231DDDB0F9@DM6PR04MB6493.namprd04.prod.outlook.com>
 <BN8PR04MB585790F2E1E8B2EBC6B05310BA0F9@BN8PR04MB5857.namprd04.prod.outlook.com>
 <DM6PR04MB6493374129A5668B5722CFC0DB0E9@DM6PR04MB6493.namprd04.prod.outlook.com>
 <BYAPR04MB4423787D7E4151F21F0F7A0AAF109@BYAPR04MB4423.namprd04.prod.outlook.com>
Another sample from Gmail:
Message-ID:
<CAEA+Pe0BKtiu8O8TRC0vQC2gu+OmbR6PrEdS93+-0yOxhGzEVQ@mail.gmail.com>
In-Reply-To:<CAG1Mi6WPzXh_B3Aw5VcF8mJG1F7AjdqOZHw=Lc+p0+oiH4O2Aw@mail.gmail.com>
References:
<CADeEcsvRNwQxAP1qE4gqKeUGtqMLNR3qu+qcD4Q1p4K-CJKzqg@mail.gmail.com> <CAG1Mi6WPzXh_B3Aw5VcF8mJG1F7AjdqOZHw=Lc+p0+oiH4O2Aw@mail.gmail.com>
ThreadIdentifier
ThreadIdentifier is a field in the EmailMessage object that will be auto-populated when sending emails or receiving emails to Salesforce, including manually logged emails.
For email-to-case, ThreadIdentifier is the Message-ID of the 1st email sent by a user as a brand new email (not reply/forward), and that email possibility not be sent to the email-to-case email address alias yet.
Example:
- A send email to B [Message-ID = A123]
- B replies A, and also cc to email-to-case email address [Message-ID = B888]
ThreadIdentifier in this scenario is A123, even though A did not send the email to the email-to-case email address. 
Email-to-Case Threading
As per this article, there are 2 main points that determine if an email sent to email-to-case alias will create a new case or be attached to an existing case:
- Looks for "MessageIdentifier" in the Salesforce EmailMessage object to match with the "In-Reply-To" email address in the email header, if not found, continue to (2) 
- Looks for "MessageIdentifier" in the Salesforce EmailMessage object to match with the "Reference" email address in the email header 
If found, link that email to the case by looking at ParentId (which is Case Id) from the matching record of the EmailMessage object. So, the matching here is not based on ThreadIdentifier. 
First scenario:
1. A create a brand new email and send it to B and C
2. B reply all and forward the email to email_alias@domain.com
3. C reply all and forward the email to email_alias@domain.com
The above scenario will create 2 cases:
- with the same ThreadIdentifier, but MessageIdentifier generated from email B (2) and email C (3) is different
- all references in the following emails will not contain the original email A (1) in their reference, so these 2 cases will be continuously run independently.
Second scenario:
1. A creates a brand new email, sends it to B, also to email_alias_1@domain.com and email_alias_2@domain.com 
2. B reply all
What will happen from the above scenario:
- The first email from A will generate 2 cases with the same MessageIdentifier
- When B hit reply all, there are 2 streams of email reaching Salesforce, and both looking for the same MessageIdentifier, the system will simply attach the email to any of these 2 cases, but from my trail, these 2 streams' of email are attached to the last cases created. so there will be 3 emails in the last case created
Third scenario:
1. A creates a brand new email and sends it to B, also to email_alias_1@domain.com  
2. B reply all and add email_alias_2@domain.com
What will happen from the above scenario:
- The first email from A will generate 1 new case
- When B hit reply all and adds new email alias linked to email-to-case, the "In-Reply-To:" in the second email will point to the Message-ID of the first email which is the MessageIdentifier for the case created in the first email, so there will be no new case created here
Sample query: Select ParentId, CreatedDate, Status, Subject, FromAddress, ToAddress, CcAddress, MessageIdentifier, ThreadIdentifier, Id, ReplyToEmailMessageId from EmailMessage where CreatedDate = Today
Reference: