1. Introduction

CCP Elite Database (中共精英资料库)

Introduction

We are collecting biographical and career data of central and provincial high-ranking officials. For now, the database contains 4563 cadres including all the Central Committee members (中央委员) and Alternative Central Committee members (中央候补委员) from 1st to 18th Party Congress, Standing Members of Provincial-level Committees (省委常委) from 1976 to 2015, deputy directors of ministries or equivalents (副部级之类), Members of Central Law and Politics Leading Group (中央政法委员), and People's Liberation Army (解放军) high-ranking officials from 1992 to 2015. We are constantly updating the name-list. The database will be published soon.

In the database, we provide detailed biographical information including birth year, gender, ethnicity, county-level birth place, education level, university alma maters, year of entering and retiring the CCP, work experience prior to becoming a CCP member, revolutionary and purged background, and all the job trajectories that each cadre had experienced since 1949. In particular, we build up the most extensive job codes that can virtually identify almost every party, military, and government departments and sub-departments from the central level to the county level. The database also includes directly administered organs of the state council, bureaus administrated by each ministry, temporary or cooperative offices, leading small groups in party and government, mass organizations led by the CCP, and abolished departments from 1949. There are 2411 administrative units, 1040 universities and colleges, and 15000 job specifications in the codebook.

Building up this large-scale database only with manual efforts needs too much time and energy. I wrote a Visual Basic script that automatically scraps, parses, and compiles web-based biographical information from a variety of open-source web-pages including encyclopedia, newspapers, blogs, government websites, and so on. The main sources will be cited along with the codebook. From unstructured text-formatting data, the automated web-scrapper has achieved over 90% accuracy rate on demographic variables and around 80% detection rate on career variables. I am currently working on improving accuracy and detection rates of the automation processes. After the program stores the scrapped data in our local data-server, we manually double-check and confirm the stored information. You can download the automated programs, here.