Name-Generator

Female

    Male

      Gender Neutral


        How it Works

        The first thing we need is a list of names to generate the new one's. Conveniently you can download the top 1,000 most popular baby name's from the U.S. government.

        Now that we have a word list to work from, the program needs to take all the data in and start analyzing it. The first thing it does is create a table of the alphabet, for each letter of the alphabet. This table will contain the likelihood's of the next letter being picked.

        These liklihood's are determined by going through each word one letter at a time, checking what the following letter is, and adding 1 to that letter's position in the table. For example if my name "Aaron" was the first word analyzed, it would be stored as:

        A →
        A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
        1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
        N →
        A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        O →
        A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
        0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
        R →
        A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
        Excluded for readability, but there's a table for each letter of the alphabet.

        Now that we've fully generated the table we can start generating names. First a random name length between 3 and 10 is chosen, as well as random intial letter. Now we can generate the next letter using the weights determined in our likelihoods's table, as corresponding to our current letter.

        This step is then repeated to generate the rest of the letters in the word. Now that a word has been generated, we just have to repeat all the previous steps to generate however many words we want.

        Once we've done all this for one gender's dataset, we repeat the whole process all over again for the remaining two datasets, with the gender-neutral set simply being the combination of the male and female sets.

        What the Real Data Looks Like

        A visualization of the gender-neutral letter frequency data.

        Outside of just being visually appealing, this visualization shows something quite interesting about the letter frequency data.

        The arrows are colored according to the likelihood the letter it's pointing at is picked next. This means that most letters have a very similar likelihood, outside of a few outliers. This is why for example you might oddly enough see three or more n's in a row, as it has a high likelihood of being it's own next letter.