beacon.objects package¶
Submodules¶
beacon.objects.angellist_miner module¶
-
class
beacon.objects.angellist_miner.
AngelListMiner
[source]¶ Bases:
object
Retrieves information about an AngelList user.
profile picture, and full name. We may be able to parse the bio, what_i_do and what_ive_built, for email and username information.
API Endpoints: * GET /users/search
- slug - The URL slug of the desired user. i.e. https://angel.co/{slug}
- md5 - An MD5 hex hash of the email address of the desired user.
Field Names (that we care about): * aboutme_url - A URL to the person’s about.me profile * behance_url - A URL to the person’s Behance.net profile * blog_url - A URL to the person’s blog, possibly a personal website * online_bio_url - A URL to the person’s online bio, possibly a personal website * twitter_url - A URL to the person’s Twitter profile * facebook_url - A URL to the person’s Facebook profile * linkedin_url - A URL to the person’s LinkedIn profile * angellist_url - A URL to the person’s AngelList profile (this is redundant) * github_url - A URL to the person’s GitHub profile * dribble_url - A URL to the person’s Dribble profile * resume_url - A URL to the person’s resume. May be a pdf, docx, website, etc * image - A URL to the person’s profile picture * name - The person’s full name, short name, or nickname * what_i_do - A blurb about the person’s career, may contain email/username information * what_ive_built - A blurb abut the person’s achievements, may contain email/username
information
beacon.objects.email_miner module¶
-
class
beacon.objects.email_miner.
EmailMiner
[source]¶ Bases:
object
An object to determine email address validity and similarity to an individual across a variety of popular public and private email services.
According to RFC 5321 section 3.5, the
VRFY
command exists to verify if a username exists and may include the full name of the user. These commands are commonly disabled on most services for security reasons (e.g. deter spammers), including authenticated sessions.rcpt to: <somereallylongemailaddressthatdoesntwork584@gmail.com> 250 2.1.5 OK i199sm3826946qhc.44 - gsmtp vrfy <somereallylongemailaddressthatdoesntwork584@gmail.com> 252 2.1.5 Send some mail, I'll try my best d10sm3853854qhc.36 - gsmtp
-
email_services
= ['aol.com', 'atmail.com', 'fastmail.com', 'getanemailaddress.info', 'gmail.com', 'gmx.com', 'gmx.net', 'gmx.us', 'hushmail.com', 'hushmail.me', 'hush.com', 'hush.ai', 'mac.hush.com', 'icloud.com', 'me.com', 'lycos.com', 'mail.com', 'email.com', 'outlook.com', 'hotmail.com', 'protonmail.com', 'rediffmail.com', 'runbox.com', 'yahoo.com', 'yahdex.com', 'zoho.com']¶
-
get_email_addresses_with_usernames
(usernames)[source]¶ Enumerate possible email addresses for
usernames
Parameters: usernames – A list of usernames Returns: A list of email addresses
-
max_email_address_length
= 254¶
-
beacon.objects.person module¶
beacon.objects.person_locator module¶
-
class
beacon.objects.person_locator.
PersonLocator
(person)[source]¶ Bases:
object
An object used to locate the online presence of an individual.
-
_determine_usernames_from_urls
()[source]¶ Mine the person’s URLs and save any usernames found. Modifies a dictionary of services to usernames on the person object.
{ 'LinkedIn: ['a_username'], 'AngelList: ['a_username'], 'Twitter': ['a_different_username'], }
Returns: None
-
_discover_email_addresses_with_usernames
(usernames)[source]¶ Discover valid email addresses using only the usernames in
usernames
.Parameters: usernames – The usernames to use when discovering new email addresses Returns: A list of new email addresses
-
_enumerate_full_name_representations
()[source]¶ Enumerate a person’s full name in the common ways full names can be represented. Includes common nicknames for the person’s first name and middle name/initial when the person has a middle name. Full names generated are intended to be compared to full names obtained from API services, email servers, etc.
For example: Variants of James Herbert Bond include, but aren’t limited to the following:
- James Bond
- Bond James
- James Herbert Bond
- James H Bond
- Jimmy Bond
- Bond, James
- Bond, James Herbert
- Bond, Jim Herbert
Todo
- Migrate to use the get_[f]ml_name_variations() functions.
- Remove some of the noise via nickname probability mappings
Returns: None
-
_enumerate_probable_usernames
()[source]¶ Build a simple list of user names based on a person’s full name.
Limit our formatting to only the special symbols in
._
and alphanumeric characters ina-zA-Z0-9
. Usernames of services such as Gmail, Yahoo, Outlook.com, LinkedIn, AngelList, and Twitter are restricted to these characters despite RFCs allowing more characters ( including unicode in some cases).Email: RFC 3696
Todo
- Expand our variations to include numbers once we obtain age, birthday, etc
- Translate non-latin characters to their latin equivalent
Returns: None
-
_locate_brute_force
()[source]¶ Bluntly search for our person on the world wide web.
Generate a set of likely usernames minus any usernames already searched for
- While we can obtain new usernames and email addresses.
- Find valid email addresses that report the same full name as our person
- If we were unable to find a user on one of the social services, use the new email addresses to attempt to find a user on that service
Updates
self.person
with the most accurate information we can locateWarning
Can result in thousands of API calls to LinkedIn, AngelList, Twitter, etc. Use with caution when searching for lots of people simultaneously.
Returns: None
Mine the all social services for personal information. Use the person’s existing username dictionary if
email_address
is NoneParameters: email_addresses – Email addresses to search for on each service Returns: A dict() of information type to information objects: {‘email_address’: ['example@gmail.com‘]}
-
locate
(brute_force=False)[source]¶ Intelligently search for our person on the world wide web. Only brute force if necessary
Use the usernames we parsed from the profile URLs to contact all Social Services
- While we can obtain new usernames and email addresses
- Use the same usernames, along with email addresses obtained from the Social Services to discover new email addresses
- If we were unable to find a user on one of the social services, use the new email addresses to attempt to find a user on that service
If we still don’t have any email addresses or social service URLs brute force locate
Updates
self.person
with the most accurate information we can locateWarning
brute_force
can result in thousands of API calls to LinkedIn, AngelList, Twitter, etc. Use with caution when searching for lots of people simultaneously.Parameters: brute_force – Attempt to brute force usernames, email addresses, and social profiles Returns: None
-
beacon.objects.social_miner module¶
Bases:
object
An object to search LinkedIn, AngelList, and Twitter for accounts matching certain usernames and email addresses and gathering account information about that individual.
Note
Twitter does not support obtaining or searching for email addresses from `any API endpoint`_. However may be able to parse the user’s own description for email addresses. It may be possible to use the import my contacts feature somehow to get around this limitation. Other notable information we can gather is the user’s name, profile picture, and banner picture.
Note
AngelList allows lookup by URL slug (i.e. the link text a user has chosen for their profile, e.g. James Bond could be using slug: james-bond) or MD5 hash of a user’s email address. Other notable information we can gather is blog_url, online_bio_url, twitter_url, facebook_url, linkedin_url, angellist_url, dribble_url, github_url, resume_url, profile picture, and full name. We may be able to parse the bio, what_i_do and what_ive_built, for email and username information.
Note
LinkedIn, for privacy and security reasons, has locked down on their API and doesn’t allow searching for users. Period. They only expose the controls necessary to write third party apps which act on behalf of users, only if authorized by that user. The only thing we could do is simulate interacting with the LinkedIn search while masquerading as a real person with an account and scrape the results.
Note
Both Twitter and LinkedIn offer a find my contacts feature to find people by email address. We might be able to find a way to programmatically do this.
- Create new gmail account
- Add all emails we think belong to the person via Google API
- Add new gmail to linked in profile via LinkedIn API
- Call find my contacts on LinkedIn
- Receive valid contacts
- Build profile URLS for each contact
- Scrape profiles