beacon.objects package

Submodules

beacon.objects.angellist_miner module

class beacon.objects.angellist_miner.AngelListMiner[source]

Bases: object

Retrieves information about an AngelList user.

profile picture, and full name. We may be able to parse the bio, what_i_do and what_ive_built, for email and username information.

API Endpoints: * GET /users/search

  • slug - The URL slug of the desired user. i.e. https://angel.co/{slug}
  • md5 - An MD5 hex hash of the email address of the desired user.

Field Names (that we care about): * aboutme_url - A URL to the person’s about.me profile * behance_url - A URL to the person’s Behance.net profile * blog_url - A URL to the person’s blog, possibly a personal website * online_bio_url - A URL to the person’s online bio, possibly a personal website * twitter_url - A URL to the person’s Twitter profile * facebook_url - A URL to the person’s Facebook profile * linkedin_url - A URL to the person’s LinkedIn profile * angellist_url - A URL to the person’s AngelList profile (this is redundant) * github_url - A URL to the person’s GitHub profile * dribble_url - A URL to the person’s Dribble profile * resume_url - A URL to the person’s resume. May be a pdf, docx, website, etc * image - A URL to the person’s profile picture * name - The person’s full name, short name, or nickname * what_i_do - A blurb about the person’s career, may contain email/username information * what_ive_built - A blurb abut the person’s achievements, may contain email/username

information

beacon.objects.email_miner module

class beacon.objects.email_miner.EmailMiner[source]

Bases: object

An object to determine email address validity and similarity to an individual across a variety of popular public and private email services.

According to RFC 5321 section 3.5, the VRFY command exists to verify if a username exists and may include the full name of the user. These commands are commonly disabled on most services for security reasons (e.g. deter spammers), including authenticated sessions.

rcpt to: <somereallylongemailaddressthatdoesntwork584@gmail.com>
250 2.1.5 OK i199sm3826946qhc.44 - gsmtp
vrfy <somereallylongemailaddressthatdoesntwork584@gmail.com>
252 2.1.5 Send some mail, I'll try my best d10sm3853854qhc.36 - gsmtp
email_services = ['aol.com', 'atmail.com', 'fastmail.com', 'getanemailaddress.info', 'gmail.com', 'gmx.com', 'gmx.net', 'gmx.us', 'hushmail.com', 'hushmail.me', 'hush.com', 'hush.ai', 'mac.hush.com', 'icloud.com', 'me.com', 'lycos.com', 'mail.com', 'email.com', 'outlook.com', 'hotmail.com', 'protonmail.com', 'rediffmail.com', 'runbox.com', 'yahoo.com', 'yahdex.com', 'zoho.com']
get_email_addresses_with_usernames(usernames)[source]

Enumerate possible email addresses for usernames

Parameters:usernames – A list of usernames
Returns:A list of email addresses
is_valid_email_domain(domain)[source]
max_email_address_length = 254

beacon.objects.person module

class beacon.objects.person.Person(first_name, last_name, middle_name=None, domains=None, linkedin_url=None, angellist_url=None, twitter_url=None)[source]

Bases: object

An object to represent a person and their online presence.

has_middle_name()[source]

Determine if a Person has a middle name

beacon.objects.person_locator module

class beacon.objects.person_locator.PersonLocator(person)[source]

Bases: object

An object used to locate the online presence of an individual.

_determine_usernames_from_urls()[source]

Mine the person’s URLs and save any usernames found. Modifies a dictionary of services to usernames on the person object.

{
    'LinkedIn: ['a_username'],
    'AngelList: ['a_username'],
    'Twitter': ['a_different_username'],
}
Returns:None
_discover_email_addresses_with_usernames(usernames)[source]

Discover valid email addresses using only the usernames in usernames.

Parameters:usernames – The usernames to use when discovering new email addresses
Returns:A list of new email addresses
_enumerate_full_name_representations()[source]

Enumerate a person’s full name in the common ways full names can be represented. Includes common nicknames for the person’s first name and middle name/initial when the person has a middle name. Full names generated are intended to be compared to full names obtained from API services, email servers, etc.

For example: Variants of James Herbert Bond include, but aren’t limited to the following:

  • James Bond
  • Bond James
  • James Herbert Bond
  • James H Bond
  • Jimmy Bond
  • Bond, James
  • Bond, James Herbert
  • Bond, Jim Herbert

Todo

  • Migrate to use the get_[f]ml_name_variations() functions.
  • Remove some of the noise via nickname probability mappings
Returns:None
_enumerate_probable_usernames()[source]

Build a simple list of user names based on a person’s full name.

Limit our formatting to only the special symbols in ._ and alphanumeric characters in a-zA-Z0-9. Usernames of services such as Gmail, Yahoo, Outlook.com, LinkedIn, AngelList, and Twitter are restricted to these characters despite RFCs allowing more characters ( including unicode in some cases).

Email: RFC 3696

Todo

  • Expand our variations to include numbers once we obtain age, birthday, etc
  • Translate non-latin characters to their latin equivalent
Returns:None
_locate_brute_force()[source]

Bluntly search for our person on the world wide web.

  1. Generate a set of likely usernames minus any usernames already searched for

  2. While we can obtain new usernames and email addresses.
    1. Find valid email addresses that report the same full name as our person
    2. If we were unable to find a user on one of the social services, use the new email addresses to attempt to find a user on that service

Updates self.person with the most accurate information we can locate

Warning

Can result in thousands of API calls to LinkedIn, AngelList, Twitter, etc. Use with caution when searching for lots of people simultaneously.

Returns:None
_mine_personal_information_from_social_services(email_addresses=None)[source]

Mine the all social services for personal information. Use the person’s existing username dictionary if email_address is None

Parameters:email_addresses – Email addresses to search for on each service
Returns:A dict() of information type to information objects: {‘email_address’: ['example@gmail.com‘]}
locate(brute_force=False)[source]

Intelligently search for our person on the world wide web. Only brute force if necessary

  1. Use the usernames we parsed from the profile URLs to contact all Social Services

  2. While we can obtain new usernames and email addresses
    1. Use the same usernames, along with email addresses obtained from the Social Services to discover new email addresses
    2. If we were unable to find a user on one of the social services, use the new email addresses to attempt to find a user on that service
  3. If we still don’t have any email addresses or social service URLs brute force locate

Updates self.person with the most accurate information we can locate

Warning

brute_force can result in thousands of API calls to LinkedIn, AngelList, Twitter, etc. Use with caution when searching for lots of people simultaneously.

Parameters:brute_force – Attempt to brute force usernames, email addresses, and social profiles
Returns:None

beacon.objects.social_miner module

class beacon.objects.social_miner.SocialMiner[source]

Bases: object

An object to search LinkedIn, AngelList, and Twitter for accounts matching certain usernames and email addresses and gathering account information about that individual.

Note

Twitter does not support obtaining or searching for email addresses from `any API endpoint`_. However may be able to parse the user’s own description for email addresses. It may be possible to use the import my contacts feature somehow to get around this limitation. Other notable information we can gather is the user’s name, profile picture, and banner picture.

Note

AngelList allows lookup by URL slug (i.e. the link text a user has chosen for their profile, e.g. James Bond could be using slug: james-bond) or MD5 hash of a user’s email address. Other notable information we can gather is blog_url, online_bio_url, twitter_url, facebook_url, linkedin_url, angellist_url, dribble_url, github_url, resume_url, profile picture, and full name. We may be able to parse the bio, what_i_do and what_ive_built, for email and username information.

Note

LinkedIn, for privacy and security reasons, has locked down on their API and doesn’t allow searching for users. Period. They only expose the controls necessary to write third party apps which act on behalf of users, only if authorized by that user. The only thing we could do is simulate interacting with the LinkedIn search while masquerading as a real person with an account and scrape the results.

Note

Both Twitter and LinkedIn offer a find my contacts feature to find people by email address. We might be able to find a way to programmatically do this.

  1. Create new gmail account
  2. Add all emails we think belong to the person via Google API
  3. Add new gmail to linked in profile via LinkedIn API
  4. Call find my contacts on LinkedIn
  5. Receive valid contacts
  6. Build profile URLS for each contact
  7. Scrape profiles

Module contents