Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



UC output file

USEARCH cluster format (UC) is a tab-separated text file. UC output is supported by clustering and database search. By convention, the .uc filename extension is used. Each line is either a comment (starts with #) or a record. Every input sequence generates one record (H, S or N); additional record types give information about clusters. If an input sequence matched a target sequence, then the alignment and the identity computed from that alignment are also provided. Fields that do not apply to a given record type are filled with an asterisk placeholder (*).
Field   Description
1   Record type S, H, C or N (see table below).
2   Cluster number (0-based).
3   Sequence length (S, N and H) or cluster size (C).
4   For H records, percent identity with target.
5   For H records, the strand: + or - for nucleotides, . for proteins.
6   Not used, parsers should ignore this field. Included for backwards compatibility.
7   Not used, parsers should ignore this field. Included for backwards compatibility.
8   Compressed alignment or the symbol '=' (equals sign). The = indicates that the query is 100% identical to the target sequence (field 10).
9   Label of query sequence (always present).
10   Label of target sequence (H records only).
Record   Description
H   Hit. Represents a query-target alignment. For clustering, indicates the cluster assignment for the query. If -maxaccepts > 1, only there is only one H record giving the best hit. To get the other accepts, use another type of output file, or use the -uc_allhits option (requires version 6.0.217 or later).
S   Centroid (clustering only). There is one S record for each cluster, this gives the centroid (representative) sequence label in the 9th field. Redundant with the C record; provided for backwards compatibility.
C   Cluster record (clustering only). The 3rd field is set to the cluster size (number of sequences in the cluster) and the 9th field is set to the label of the centroid sequence.
N   No hit (for database search without clustering only). Indicates that no accepts were found. In the case of clustering, a query with no hits becomes the centroid of a new cluster and generates an S record instead of an N record.